R Programming Homework Help
R programming is a language and an environment for graphics and statistical computing. In the following R programming homework help, we have used R statistical software for data analysis, manipulation and visualization. Data from global agencies are collected and processed that depicts the scenario of technology usage around the world.The data collected is often messy and vague and hence it goes through the process of cleaning up and the data is ready for analysis and visualization. We have made the data comparable from different files and produced the visualization to examine the data quality in R programming assignment help. Finally, based on the analysis we have interpreted and reported the results.
We will be using R programming language to manipulate data and draw conclusions from it. At the end we will perform a linear regression using the data we have chosen.
Firstly, we have created three .csv files from the three .xlxs files that were provided. Converting to comma separated files make a file that is easily readable with R. All three .csv files were loaded R programming homework help and transformed in data frames. Some of the columns were omitted as was instructed in the assignment.
All three dataframes crated from the .csv files had the same number of columns :
>length(CellPhones)
[1] 23
>length(InternetUsers)
[1] 23
>length(PerComp)
[1] 18
R script
#**********************************************
# Assignment Name: week 12
# Data Scientist Name:
# Completion Date:
# Project Description: This project uses R programming assignment help software to manipulate data tables.
# The clear data is then examined and described.
Linear regression is performed on the data and conclusions are made.
# Consultations:
# Citations:
http://stackoverflow.com/questions/4862178/removerowswithnasindataframe
http://stackoverflow.com/questions/12486264/averageacrosscolumnsinrexcludingnas
http://onlinestatbook.com/2/describing_bivariate_data/pearson.html
#***********************************************
#loading the data Cellphones
D<read.csv(“Cp.csv”,header=T)
head(D)
#choosing the collumns
cn<D[,1]
cy<D[,27:48]
#creating dataframe
CellPhones<data.frame(cn,cy)
head(CellPhones)
First two dataframes :CellPhones and InternetUsers have the same number R programming project help of years covered 19902011. Those are the same years as data manipulation was performed so to create such a situation. The third dataframePerComp has only 18 years covered. Those are years from 1990 to 2006. It was necessary to perform the estimation of the remaining years 20062011 according to formulas:
 2007=2006+(((20062000)/2000)*2006)
 2008=2007+(((20062001)/2001)*2006)
 2009=2008+(((20062002)/2002)*2006)
 2010=2009+(((20062003)/2003)*2006)
 2011=2010+(((20062004)/2004)*2006)
These new variables are added to the dataframe named PerComp and this dataframe now consists of same number of columns as the other two:
>length(PerComp)
[1] 23
Since it is not wise to monitor the changes in each year individually as help with R programming assignment each market and consumption can vary drastically, I created three year groups. First group covers the data from 19901996, second from 19972003 and the third from 20042011. These groups were created by using the mean function in R. They are filled with the mean of data values corresponding to years from which each group Is created. I still haven’t removed the missing data values as that would make the prediction of PerComp for years 20072011 extremely bad. In other, words, each of the nine groups (3 from CellPhones, 3 from InternetUsers and three from PerComp) still have some missing values in them.
Namely the missing values in the groups are caused by:
 In dataframesCellPhones and InternetUsers the missing values occur only if all the members of the year groups were missing in the original data.
 In dataframePerComp the missing values are caused either by the data which is really missing, or if some of the predictors used in predicting R programming project help the missing columns is missing. If only one of the predictors is missing in the prediction of the missing columns than the missing columns will be filled with NA’s. Therefore, this dataframe is the one with the least information in it as most of it is omitted.
Now that I have three variables, each with three year groups, we have merged R programming homework help them in one dataframe. This dataframe has 10 columns as there is the location variable included:
>length(data)
[1] 10
The total number of rows in this data frame is:
>nrow(data)
[1] 185
This data has missing values in it. At this point it was necessary to remove the NA’s so that the statistics can be performed. I have removed all the rows from the merged dataframe which had at least one missing value in any column of that row. The new datatable with removed rows counts much less observations than the previous one:
>nrow(data)
[1] 108
Even though quite a lot of information was lost because the help with R programming homework rows were omitted for having one missing value, it was necessary so that the descriptive statistics can be performed.
I have created the descriptive statistics table which is shown below:
The variables are named ccell phone users, Iinternet users, ppersonal computer users and the g letter is short for group. There are three groups of each variable corresponding to three time periods.
cg1 
cg2 
cg3 
ig1 
ig2 
ig3 
pg1 
pg2 
pg3 

Min. 
0 
0.04689 
3.388 
0 
0.0951 
0.8286 
0.02333 
0.13 
0.465 
1st Qu. 
0.01353 
2.96124 
47.087 
0.01742 
1.0637 
9.0648 
0.29 
1.244 
8.312 
Median 
0.25124 
11.53497 
80.678 
0.13799 
3.8648 
24.1396 
1.25 
4.543 
22.252 
Mean 
1.6345 
21.04866 
77.667 
1.03065 
10.7953 
32.3304 
4.56173 
11.654 
55.377 
3rd Qu. 
1.81743 
34.47994 
107.071 
0.94704 
16.3192 
54.0559 
4.40336 
13.316 
90.245 
Max. 
13.57574 
73.30983 
161.68 
11.8016 
51.6256 
90.2737 
37.7 
58.43 
581.505 
Standard Deviations 
2.894409 
21.86306 
35.59035 
2.024625 
14.11633 
26.6115 
7.353599 
15.3171 
79.66879 
In the table there are means, medians, minimums, maximums and standard deviations of each year group of each of the variables.
#loading the data Internet
D<read.csv(“Internet.csv”,header=T)
#creating dataframe
InternetUsers<data.frame(D)
head(InternetUsers)
#loading the data personal Computers
D<read.csv(“Pc.csv”,header=T)
head(D)
#choosing collumns
p<D[,1:18]
#creating data frame
PerComp<data.frame(p)
#creating predictions
#predictions will fail if one of the predictors is NA.
#I have not excluede the failure statistics using R programming homework help because we would get wrong results otherwise
#All NA’s will be omitted later when R programming assignment help the final data frame is created.
PerComp$X2007<PerComp$X2006+(((PerComp$X2006PerComp$X2000)/PerComp$X2000)*PerComp$X2006)
PerComp$X2008<PerComp$X2007+(((PerComp$X2006PerComp$X2001)/PerComp$X2001)*PerComp$X2006)
PerComp$X2009<PerComp$X2008+(((PerComp$X2006PerComp$X2002)/PerComp$X2002)*PerComp$X2006)
PerComp$X2010<PerComp$X2009+(((PerComp$X2006PerComp$X2003)/PerComp$X2003)*PerComp$X2006)
PerComp$X2011<PerComp$X2010+(((PerComp$X2006PerComp$X2004)/PerComp$X2004)*PerComp$X2006)
head(PerComp)
It is clearly seen in this table that the mean of each R programming homework help variable increases as the years increase. The standard deviations and medians increase accordingly. Only the last year group of the variable Personal Computer users have the extremely high maximum and thus the very large standard deviation.
Pearson’s Product Moment Correlation results are shown in the table below:
cg1  cg2  cg3  ig1  ig2  ig3  pg1  pg2  pg3  
cg1 
1 

cg2 
0.775607 
1 

cg3 
0.503776 
0.767462 
1 

ig1 
0.826473 
0.652487 
0.417134 
1 

ig2 
0.865827 
0.848995 
0.592029 
0.87648 
1 

ig3 
0.756207 
0.888611 
0.76612 
0.738805 
0.916572 
1 

pg1 
0.748333 
0.747019 
0.509166 
0.832976 
0.873389 
0.805167 
1 

pg2 
0.824182 
0.850373 
0.586851 
0.826223 
0.951015 
0.884521 
0.928464 
1 

pg3 
0.290429 
0.362988 
0.238245 
0.273848 
0.443159 
0.444679 
0.368071 
0.441712 
1 
I can see that there are some strong correlations between variables. Most strong correlations statistics using R programming homework help are between two adjacent year groups of the same variable which was expected as this data is not truly independent but rather dependent as a timeseries.
However, it can be seen that some variables are strongly correlated. For example, Personal Computer users from second time period are strongly correlated to the cell phone users in the first time period. Same can be said for Personal computer users in second period and internet users in the second period. A visual insight into the relationships of the variables can be seen using plots.
I have plotted each of the pairs of variables from the same time period. The first plot shows how the number of Cell Phones users depends on the number of Internet usersstatistics using R programming assignment help throughout three time groups.
It can be seen that in each time period, the number of cell phone users is directly proportional to the number of internet users. This is clear from the linear regression line which goes through each of the plots. In each plot, the line is rising meaning that these two variables R programming problems with answers are positively correlated.
#checking the dimensions of the dataframes
length(CellPhones)
length(InternetUsers)
length(PerComp)
nrow(CellPhones)
nrow(InternetUsers)
nrow(PerComp)
#creating groups by years
#I am using function “rowMeans” which R programming questions with answers calculates the mean of the selected rows
#NA’s are ommited from the calculations
#cell phone groups
cellphoneYG9096<rowMeans(cbind(CellPhones[,2:8]), na.rm=TRUE)
cellphoneYG9703<rowMeans(cbind(CellPhones[,9:15]),na.rm=TRUE)
cellphoneYG0411<rowMeans(cbind(CellPhones[,16:23]),na.rm=TRUE)
#internet groups
internetYG9096<rowMeansR programming homework solution (cbind(InternetUsers[,2:9]), na.rm=TRUE)
internetYG9703<rowMeans(cbind(InternetUsers[,9:15]),na.rm=TRUE)
internetYG0411<rowMeans(cbind(InternetUsers[,16:23]),na.rm=TRUE)
#PC groups
personalYG9096<rowMeansR programmingassignment solution(cbind(PerComp[,2:8]), na.rm=TRUE)
personalYG9703<rowMeans(cbind(PerComp[,9:15]),na.rm=TRUE)
personalYG0411<rowMeans(cbind(PerComp[,16:23]),na.rm=TRUE)
Next, I produced three scatterplots showing how the number of internet users depend on the number of personal R programming homework help computer users throughout three year groups.
Again there is the positive linear correlation which implies that the data R programming assignment help analysis using more PC users there are the more Internet users will be.
In the third graph corresponding to year group from year 2004 to year 2011 I have excluded one observation I have identified as an outlier. I have done so by changing the range of the xaxis so as not to show that particular R programming homework help observation.
I have plotted the scatterplots of the remaining combination of variables showing how does a number of PC users depend on the number of cell phone users.
As can be seen, there is a positive correlation between the two variables at each time period.
By using these three scatterplots I have visualized the relationship between the three variables. The conclusion is that the increase in any sort of technology induces thedata analysis using R programming homework help increase in the other sort of technology in each epoch.
I can also clearly deduce that in the earliest epoch from year 1990 to year 1996 there were a lot of observations with the low values of number of technology users. With the progress of time these pay for R programminghomework clusters have dispersed throughout the whole scatterplots implying that the technology had become more accessible and more people are using it.
#attaching data so that I can access variables easily
attach(data)
#creating the summary of all the collumns corresponding to variables
summary(data[,2:10])
#creating an empty array to store all the standard deviations
deviations<rep(0, times=9)
#creating a loop to calculate and store R programming assignment help the standard deviation of each variable
for (i in 2:10)
{
deviations[i1]=sd(data[,i])
}
#listing all standard deviations from first to last variable
deviations
#creating matrix of Pearson’s correlation coefficients
cor(data[,2:10],method=”pearson”)
#parameters of the plot display device
# three rows and one collumn
par(mfrow=c(3,1))
#ploting three plots of three year groups of Cell phones vs Internet users
plot(ig1,cg1, xlab=”internet users”, ylab=”Cellular phone users”, main=”years 19901996″ )
Now that I have explored all the available data I will construct a linear regression model to test a hypothesis.
My assumption addresses the question of technology R programming homework help availability to the consumers. Namely, I assume that the lack of internet and the lack of available cell phones resulted in the increased relative number of PC users. In other words, I assume that a good internet and R programming homework help the availability of cell phones decreases the relative number of PC users.
I will test my assumption by monitoring how does a pay for R programming homework number of PC users depend on the difference between the number of internet users and the number of cell phone users. In other words, I will test whether, at some degree, people prefer the cell phone usage with wireless internet coverage over the PC usage with good internet. The assumption will be tested in such a manner that I will monitor the behavior of the same linear regression model over three time periods. Since the cell phone technology wasn’t as accessible in the first two time periods as it is in the third, I assume that the model will behave drastically different in these three epochs.
I have created the variable which is equal to the pay for R programming assignment difference of Internet users and Cell Phone users for each of the epochs. As a dependent variable in model I will use the number of PC users from the appropriate epoch.
Every hypothesis testing is conducted so that the null hypothesis is whether rejected or accepted. My null hypothesis is that the number of PC users is not dependent on the difference of Internet users and Cell Phone users.
I have ran my linear regression in the first time period: pg1~(ig1cg1) and the summary is:
Call:
lm(formula = y1 ~ x1)
Residuals:
Min 1Q Median 3Q Max
9.0102 0.6171 0.4776 0.0275 9.7286
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.46163 0.24585 1.878 0.0632 .
x1 0.33216 0.03643 9.117 5.46e15 ***
Residual standard error: 2.177 on 106 degrees of freedom
Multiple Rsquared: 0.4395, Adjusted Rsquared: 0.4342
Fstatistic: 83.11 on 1 and 106 DF, pvalue: 5.458e15
From the regression summary I see that the pvalue isR programming homework for money extremely small and almost identical to zero. This means tha the difference in Internet Users and Cell Phone users is a significant predictor in predicting the number of PC users. Thus, the null hypothesis must be rejected in this epoch.
#dataframe for cellphones with four columns: Location and three year groups
c<data.frame(CellPhones[,1],cellphoneYG9096,cellphoneYG9703,cellphoneYG0411)
colnames(c) < c(“country”, “cg1″,”cg2″,”cg3”)
#dataframe for internet with four columns: Location and three year groups
i<data.frame(InternetUsers[,1],internetYG9096,internetYG9703,internetYG0411)
colnames(i) < c(“country”, “ig1″,”ig2″,”ig3”)
#dataframe for Pcs with four columns: Location and three year groups
p<data.frame(PerComp[,1],personalYG9096,personalYG9703,personalYG0411)
colnames(p) < c(“country”, “pg1″,”pg2″,”pg3″)
#creating one big dataframe with 10 columns
#first column is the location, other R programming assignment for money columns are three year groups of three variables:
#cellphones, internet and PCs
data<merge(x=c,y=i)
data<merge(x=data,y=p)
#checking to see if the merged data frame is good
head(data)
#there are NA’s in the data frame
nrow(data)
#Omit all the rows containing NA’s since it is impossible to do statistics otherwise.
data<na.omit(data)
I have ran the regression in the R Coding assignment helpsecond epoch and the summary is:
Call:
lm(formula = y2 ~ x2)
Residuals:
Min 1Q Median 3Q Max
28.206 10.617 5.015 5.283 54.296
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 11.1102 1.9804 5.610 1.63e07 ***
x2 3.5728 0.4116 8.681 5.16e14 ***
#adding straight line which is the linear fit of these two variables
abline(lm(cg1~ig1))
plot(ig2,cg2, xlab=”internet users”, ylab=”Cellular phone users”,R Coding homework help main=”years 19972003″ )
abline(lm(cg2~ig2))
plot(ig3,cg3, xlab=”internet users”, ylab=”Cellular phone users”, main=”years 20042011″ )
abline(lm(cg3~ig3))
#three plots of Internet users vs PC users
plot(pg1,ig1, xlab=”PC users”, ylab=”Internet users”, main=”years 19901996″ )
#adding linear regressino line
abline(lm(ig1~pg1))
plot(pg2,ig2, xlab=”PC users”, ylab=”Internet users”, main=”years 19972003″ )
abline(lm(ig2~pg2))
plot(pg3,ig3, xlab=”PC users”, ylab=”Internet users”, xlim=c(0,400), main=”years 20042011″ )
abline(lm(ig3~pg3))
#three plots of PC users vs Cell phone users
plot(cg1,pg1, xlab=”Cellular phone users”, ylab=”PC users”, main=”years 19901996″ )
#adding linear regression line
abline(lm(pg1~cg1))
plot(cg2,pg2, xlab=”Cellular phone R programming homework help users”, ylab=”PC users”, main=”years 19972003″ )
abline(lm(pg2~cg2))
plot(cg3,pg3, xlab=”Cellular phone users”, ylab=”PC users”, ylim=c(0,300), main=”years 20042011″ )
abline(lm(pg3~cg3))
Residual standard error: 16.79 on 106 degrees of freedom
Multiple Rsquared: 0.4155, Adjusted Rsquared: 0.41
Fstatistic: 75.35 on 1 and 106 DF, pvalue: 5.16e14
The pvalue of the regression is very small and it implies that the number of R programming assignment help PC users strongly depend on the difference of InternetStatistics using R Coding homework help Users and Cell Phone users. The null hypothesis has to be rejected in the second epoch.
Running the regression in the third epoch yields:
Call:
lm(formula = y3 ~ x3)
Residuals:
Min 1Q Median 3Q Max
73.424 29.697 3.875 28.672 83.615
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 76.67945 3.87920 19.767<2e16 ***
x3 0.02830 0.05159 0.549 0.584
Residual standard error: 35.71 on 106 degrees of freedom
Multiple Rsquared: 0.00283, Adjusted Rsquared: 0.006577
Fstatistic: 0.3009 on 1 and 106 DF, pvalue: 0.5845
The pvalue of this regression is 0.5845 which is way above the 0.1 threshold. Thus the null hypothesis cannot be rejected in the third epoch. The number of PC users does not depend on the difference of Internet users and Cell phones users in this epoch.
To summarize the results, the number of PC users strongly depended on the do my R programming homework difference of Internet users and cell phone users in the first two epochs and it doesn’t depend on it in the third epoch.
This means that when the number of cell phones was very small like in the first epoch, the availability of Internet and the lack of cell phones induced a great number of PC users,.
In the second epoch the number of cell phones was still relatively small and the availability of internet was iduced the great number of PC users.
In the third epoch, the cell phones became an everyday item. The internet became available for cell phones. In this epoch the number of PC users does not depend at all on the difference of Cell phone users and Internet users. Almost all cell phone users have an internet access and this difference is negligible. Therefore, the people tend to have a cell phone while the number of PCs remained more or less the same.
These regression results showed that the mere Online R programming tutors availability of cell phones has induced the great increase in the number of cell phones while the PCs have remained in a slow and constant growth. It also shows that the mobile internet has made the number of internet users an unreliable predictor of the certain technology preference. The increase in the number of Internet users make my R programming homework implies the increase in users of both PCs and cell phones.
For further research I suggest dividing the PC internet users experts in R programming and the cell phone internet users in two distinct groups so that the more detailed analysis can be performed.