Analysing Data using Concepts and Tools Statistics Report

Analysing Data using Concepts and Tools Statistics Report

 

The purpose of this assignment is to give you an opportunity to demonstrate your skills in describing and analysing data using concepts and tools that we have developed in the course so far.

Below are instructions on how to collect a specified set of data and what to do with it.  Your goal is to produce a report in MS Word discussing the data and submit this along with a single MS Excel workbook showing your workings.   A suggested target range for the word count of the report is 700-1000 words.

I have prepared and attached an example Excel workbook which I will refer to below. Note: my Excel workbook is not a model answer.  You may choose to use different visualisations and do not necessairily need all the computed statisitcs and charts I have included.  It really depends on the features of the data you have, so you need to use your own judgement as to how to best present and describe the data.  Besides, the primary output for this assignment is the report itself, not the workbook Analysing Data using Concepts and Tools Statistics Report.

Data collection

Collect quantitative data on two variables from the Sustainable Development Report 2021 website.

  • Go to https://dashboards.sdgindex.org/and browse around the site to become familiar with its purpose and the information publicly available there.
  • Go to the “Downloads” page and click on “Database EXCEL” to download the database of indicators used to assess countries’ progress towards the UN Sustainable Development Goals.  You will be taking data from the “SDR2021 Data” sheetin the workbook. Starting from column AR of that sheet, there are columns of cross-country data for the SDG indicators, one row for each country.  Note that from row 195 the data are for regional blocs so these should be excluded from the data you take.
  • You have been assigned twovariables according to the last digit of your Student ID number.  You can find the variables assigned to you in the attached file “Assigned variables.xlsx”.  For example, my student ID number (a long long time ago in a galaxy not so far away) ended with 2 so I would be using variables “Poverty headcount ratio at $1.90/day (%)” and “Cereal yield (tonnes per hectare of harvested land)”.  I have chosen pairs of variables that may potentially have a statistical relationship.  If you wish you are welcome to switch one of the variables with another one from the database that you are interested in investigating and you think is related to the variable you retain.
  • Look up your variables in the Data Explorer on the website or in the report from page 75 (some newer variables are not included on the website yet, it seems).  The main thing you want to understand is what a given value of each of your variables means.  E.g. I found that the “Poverty headcount ratio at $1.90/day (%)”is the estimated percentage of the population that is living under the poverty threshold of US$1.90 a day.
  • Each indicator has a number of associated columns in the Database workbook, the first column of the set has the data, the others can be safely ignored.  So for the indicators I use in my example Excel workbook, I took data from columns QZ and GO (and only down to row 194).  Using Excel’s Find tool is a quick way to find your data.  Copy and paste the data you will use into your workbook.  You should keep the country names alongside the data so that you can identify which observation is for which country Analysing Data using Concepts and Tools Statistics Report.

    ORDER A PLAGIARISM FREE PAPER NOW

Prepare your data.

  • Construct separateunivariate data sets for analysis.  There will probably be many countries where there is no estimate for the indicators you are looking at.  If there is no observation recorded, do not assume the observed value is zero.  In general, missing observations in data rarely mean they should be replaced with zeros.  Also consider if it is appropriate to include observations that are recorded as zero.  In my example Excel workbook I have retained observations of zero for CO2 emissions because this suggests those countries are not exporting fossil fuels, while blank cells mean there is no observation.  It is fine to have blank cells within your data ranges, Excel will usually ignore them (as long as they are truly blank).
  • Construct a bivariate (paired) data set – i.e. for each country you should have an observation for both variables.  You can see in my example Excel workbook how I use some Excel formulas and the Replace tool to blank out cells for countries where there is only an observation for one of the two variables.  If you find that the number of countries that you have left in the bivariate data set is low, say less than 30, it might be best to go back to the Database and replace the variable that is causing many countries to be dropped.
  • In your report you should note any difficulties with the data preparation and implications of dropping countries from the data sets if such was required.

An educated guess

Guess the average value for each variable.

  • Run your eye down the column of univariate data you have for each variable (the separate data not the paired data), and make a guess what you think the cross-country average would be for each one.  Do not use Excel to calculate the averages here.
  • Just take a note of your guesses; you will use them later.

Data description

Use numerical summary measures and graphical representations to describe the two variables (using the separate data)

  • You can use the “Descriptive Statistics” tool in the data analysis tool pack and also calculate quartiles, coefficients of variation etc.
  • Draw a histogram, boxplot, etc. for each data set.
  • You should discussthe important and interesting features of the data revealed by your descriptive statistics and graphical representations in your report.  In my example workbook you will see the CO2 data is strongly positively skewed, so much so that the boxplot is almost meaningless.  Two options I had was to drop some of the largest observations, or to transform the data.  I chose the latter – by taking the log of the data I end up with a data set distribution that can be usefully presented on a boxplot or histogram.  Outliers and skewness are common features of cross-country data like this, so you should be prepared to drop observations or transform data if necessary, and explain why you did this in your report.  (Just because data is skewed doesn’t mean you have to transform it!  You’ll notice I did not transform the literacy data.)

Use numerical summary measures and graphical representations to consider if there might be a relationship between the two variables (using the paired data).

  • Use the correlation coefficient and a scatterplot to see the strength and direction of the relationship (if any) between the two variables.
  • In your report, discussthe above and explain why you think the relationship might be causative, spurious, or driven by a third factor.

Data analysis

Construct confidence intervals (using the separate data).

  • Now assume that the data for each variable is a random sample and construct a confidence interval for the population mean of each variable.  Since you don’t know the population standard deviations you should use critical values from the Student t-distribution.
  • State your confidence interval in your report, explaining what it means (to a layperson) and also discuss if you have any doubts about the validity of the interval Analysing Data using Concepts and Tools Statistics Report.

Compute p-values (using the separate data).

  • Now assume that your “educated guess” of the average for each variable is the true mean of that variable.  How likely is it that you would observe the sample mean you have obtained, or something more extreme, if your parameter assumption for each variable is correct?  I.e. find the two-tail p-valueassociated with each sample mean.
    You can obtain the p-value by doing a two-tail hypothesis for the mean, for each data set.
  • State the p-values in your report and explain their meaning.  Conclude by stating whether your educated guesses were probably right or wrong.  (There is no penalty if your educated guesses are wrong!)

Report

As noted above, your assignment output should consist of a report and a spreadsheet workbook.  Imagine that the reader of your report is a busy executive with only a basic understanding of statistics.  Your report should therefore be of professional appearance and be able to be fully understood without reference to the workbook.  I.e. paste relevant charts into the report; do not paste the full descriptive statistics table into the report but rather use an abridged table and/or discussion; do not show the computation of the confidence intervals and p-values in the report but do state and interpret them.

Remember the suggested word count is 700-1000 words but this is a guide only: if you accomplish everything required above with less, that is fine; ideally don’t go much over 1000 – this would indicate you are not being concise enough.

Finally, I have attached some collated feedback I provided to students last year.  You may like to refer to this to see what I am hoping to see in your report.

Analysing Data using Concepts and Tools Statistics Report

SPSS Assignment.

 

  1. From the General Social Survey, test whether the number of hours per day that people have to relax (HRSRELAX) varies by marital status (MARITAL). Hint: select ANALYZE, COMPARE MEANS, ONE-WAY ANOVA. Note also that the Tukey test is selected in the Post Hoc button and that means and standard deviations are selected in the Options button. (25 Points)  SPSS Assignment.

    ORDER A PLAGIARISM FREE PAPER NOW

2. Based on the General Social Survey, test whether the number of self-reports of poor mental health during the past 30 days (MNTLHLTH) varies by marital status (MARITAL) using one-way ANOVA with the Tukey test and selecting means and standard deviations as an option. (25 Points) SPSS Assignment.

3. `Based on the General Social Survey, test whether the number of self-reports of poor mental health (MNTLHLTH) varies by age group (AGE recoded into a new variable AGEGRP) using one-way ANOVA with the Tukey test and selecting means and standard deviations as an option. Before doing your analysis, recode age into a new variable (AGEGRP) using the following groups: 18 to 29; 30 to 44; 45 to 65; 65 or more. (25 Points) SPSS Assignment.

4. Analyze two variables of your choice from the General Social Survey using one-way ANOVA. Remember the dependent variable needs to be an interval/ratio level variable and the independent variable needs to have three or more categories.

  1. From the General Social Survey, test whether the number of hours per day that people have to relax (HRSRELAX) varies by marital status (MARITAL). Hint: select ANALYZE, COMPARE MEANS, ONE-WAY ANOVA. Note also that the Tukey test is selected in the Post Hoc button and that means and standard deviations are selected in the Options button. (25 Points) SPSS Assignment.

 

  1. Based on the General Social Survey, test whether the number of self-reports of poor mental health during the past 30 days (MNTLHLTH) varies by marital status (MARITAL) using one-way ANOVA with the Tukey test and selecting means and standard deviations as an option. (25 Points) SPSS Assignment.

 

  1. Based on the General Social Survey, test whether the number of self-reports of poor mental health (MNTLHLTH) varies by age group (AGE recoded into a new variable AGEGRP) using one-way ANOVA with the Tukey test and selecting means and standard deviations as an option. Before doing your analysis, recode age into a new variable (AGEGRP) using the following groups: 18 to 29; 30 to 44; 45 to 65; 65 or more. (25 Points) SPSS Assignment.

 

  1. Analyze two variables of your choice from the General Social Survey using one-way ANOVA. Remember the dependent variable needs to be an interval/ratio level variable and the independent variable needs to have three or more categories. (25 Points) SPSS Assignment.

Statistics SPSS Assignment.

  1. From the General Social Survey, test whether the number of hours per day that people have to relax (HRSRELAX) varies by marital status (MARITAL). Hint: select ANALYZE, COMPARE MEANS, ONE-WAY ANOVA. Note also that the Tukey test is selected in the Post Hoc button and that means and standard deviations are selected in the Options button. (25 Points).  Statistics SPSS Assignment.

2. Based on the General Social Survey, test whether the number of self-reports of poor mental health during the past 30 days (MNTLHLTH) varies by marital status (MARITAL) using one-way ANOVA with the Tukey test and selecting means and standard deviations as an option. (25 Points). Statistics SPSS Assignment.

3. `Based on the General Social Survey, test whether the number of self-reports of poor mental health (MNTLHLTH) varies by age group (AGE recoded into a new variable AGEGRP) using one-way ANOVA with the Tukey test and selecting means and standard deviations as an option. Before doing your analysis, recode age into a new variable (AGEGRP) using the following groups: 18 to 29; 30 to 44; 45 to 65; 65 or more. (25 Points) Statistics SPSS Assignment.

4. Analyze two variables of your choice from the General Social Survey using one-way ANOVA. Remember the dependent variable needs to be an interval/ratio level variable and the independent variable needs to have three or more categories.

Statistics homework help

Reading: week2 supplemental reading This HW has you apply that knowledge in this week’s assignment by having you think about the following problem that can be solved with database tables:

Imagine that you have to design a database that would match information about job postings and internship opportunities for candidate NYU graduating students   (So, instead of matching patrons to books as in the reading, you are matching students to jobs for this HW) Statistics homework help

ORDER A FREE-PLAGIARISM PAPER NOW

Assume that you currently have separate sets of information (data) about:

-Typical student information such as their identities, their grades, their courses taken, and other information, etc.

-Typical information about job postings and/or internships, such as role, requirements, location, paid/unpaid, and other characteristics, etc.

1) You would like to bring all of this information together in a relational database system.  Consider what data elements or variables do you think are necessary?

2) How would you structure distinct tables?  (i.e. what would constitute a row in your table(s)?)

3) How would you ensure information is stored efficiently?

4) What keys would you use to relate tables to each other? (i.e. Describe how information in one table would link with information in another table)

5) If you wanted to add the capability of ‘automatically’ matching students to jobs and vice versa, explain via an illustrative example, how this automatic matching might work for your design.

Just as the reading illustrated sample tables, and their relationships, your HW submission should show sample tables and relationships — so that I can understand your design, and how your design would work to solve the task at hand. Statistics homework help

Statistics homework help

Scenario

You are a data analyst for a basketball team. You have found a large set of historical data, and are working to analyze and find patterns in the data set. The coach of the team and your management have requested that you use descriptive statistics and data visualization techniques to study distributions of key variables associated with the performance of different teams. Data-driven analytics will help the management make decisions to further improve your team’s performance. You will use the Python programming language to perform your statistical analysis. You will also need to present a report of your findings to the team’s management. Since the managers are not data analysts, you will need to interpret your findings and describe their practical implications. The managers will use your report to find areas where the team can improve its performance.Statistics homework help

ORDER A FREE-PLAGIARISM PAPER NOW

Reference

FiveThirtyEight. (April 26, 2019). FiveThirtyEight NBA Elo dataset. Kaggle. Retrieved from https://www.kaggle.com/fivethirtyeight/fivethirtyeight-nba-elo-dataset/

Directions

For this project, you will submit the Python script you used to make your calculations and a summary report explaining your findings.

  1. Python      Script: To complete the tasks listed below, open the Project One      Jupyter Notebook link in the Assignment Information module. Your project      contains the NBA data set and a Jupyter Notebook with your Python scripts.      In the notebook, you will find step-by-step instructions and code blocks      that will help you complete the following tasks:Statistics homework help
    • Choose       and create a data visualization.
    • Calculate descriptive       statistics including mean, median, min, max, variance, and       standard deviation.
    • Construct confidence       intervals for a population proportion and a population mean.
  2. Summary      Report: Once you have completed all the steps in your Python script,      you will create a summary report to present your findings. Use the      provided template to create your report. You must complete each of      the following sections:
    • Introduction:       Set the context for your scenario and the analyses you will be       performing.
    • Data       Visualization: Identify and interpret your chosen data visualization.
    • Descriptive       Statistics: Identify and interpret measures of central tendency and       variability.
    • Confidence       Intervals: Identify and interpret the lower and upper limits of       confidence intervals.

    Conclusion: Summarize your findings and explain their practical implications Statistics homework help

Inferences On The Regression Line Assignment

Inferences On The Regression Line Assignment

Use software to compute confidence band limits at several ????” along the range of x.

Present these calculations in a table.

ORDER A PLAGIARISM-FREE PAPER NOW

Use the data from this table to plot the confidence bands with fitted line and data observations. Interpret the confidence bands using your plot.

Inferences On The Regression Line Assignment

Statistics homework help

Discussion Assignment: Please see attached file.
I need this assignment back by 01/08/21.
Discussion Assignment: 
Consider the following weighted mean example:
The scores on a Mid-Term Exam for a sample of 50 statistics students are summarized in the following table.

The professor who gave the midterm wanted to get an idea of the average class grade on the midterm for course reevaluation. Here, one would determine x¯=74.8 (try it yourself before proceeding).  This would be our weighted mean.
For this discussion, I want you to come up with your own weighted mean problem related to your major and solve it. First, create a labeled table of values (use mine as a reference), then, provide the sample size, and lastly, determine the weighted mean. Why would someone want to know the weighted mean in your example from your major? 
Search entries or author

Statistics homework help

Provide (2) 150 words response for RESPONSES 1 AND 2 below. Responses may include direct questions. In your peer posts, compare the probabilities that you found with those of your classmates. Were they higher/lower and why? In your responses, refer to the specific data from your classmates’ posts. Make sure you include your data set in your initial post as well. Attached are the excel docs for both responses to help with the post.
RESPONSE 1:
This week we worked with averages, standard deviations, and especialy probabilities.
The first step was to calculate a new standard deviation for a sample size of 4. In my case 14264/SQRT(4).
With this number and the previously calculated mean price of a vehicle, in my example 28232, we first calculate the percent chance that the next four vehicles will be 500 dollars below the mean. I came to a 47.2% chance.
The next probability we calculate is the odds of the next four cars being 1000 dollars above the mean. I came to a 44.4% chance this would be the case.
After that we calculate the odds that the next four cars would cost the same as my mean price. I came to a 50% chance that that would happen.
The final probability to calculate was if the next four cars would cost within 1500 dollars +- of the mean price. I came to a 16.7% chance that would happen.
Speaking for myself I found this post quite challenging and would welcome any critical eyes on work.
RESPONSE 2:
For this week’s forum we are asked to find the normal distribution of a set of vehicles and different probabilities.
The mean of all my vehicles without the supercar is 18,478 and I have a new standard deviation of 685.3123781.
The first question asks for the probability that the price will be less than $500 dollars below the mean. To figure this out we take my mean and subtract 500 dollars to get 17978. That is p(x<17978). In excel make sure to use norm.dist with a formula of TRUE.
Secondly we are asked to find the probability that the price will be higher than $1000 dollars above the mean. To figure this out we take the mean and add 1000 dollars to get 19478. That is p(x>19478). In excel make sure to use norm.dist with a formula of TRUE.
Next we are asked to find the probability that the price will be equal to the mean. To figure this out we take the mean and equal it out against itself at 18478. That is p(x=18478). In excel make sure to use norm.dist with a formula of FALSE.
Finally we are asked to find the probability that the price will be $1500 within the mean. To figure this out we take the mean 18478, and subtract 1500 as well as add 1500 in a separate equation. Within excel I input the normal distribution formula with the inputed numbers of =NORM.DIST(19978,16978,C28,TRUE)-NORM.DIST(16978,18478,C28,TRUE).

Statistics homework help

All posts must 100% original work. NO PLAGIARISM. Post results must be provided using the Excel attached. Make sure you interpret your results on a Word Document.
Using the data set you collected in Week 1, excluding the super car outlier, you should have calculated the mean and standard deviation during Week 2 for price data.  Along with finding a p and q from Week 3 excel.  Using this information, calculate two 95% confidence intervals.  For the first interval you need to calculate a T-confidence interval for the sample population.  You have the mean, standard deviation and the sample size, all you have left to find is the T-critical value and you can calculate the interval.  For the second interval calculate a proportion confidence interval using the proportion of the number of cars that fall below the average.  You have the p, q, and n, all that is left is calculating a Z-critical value,
Make sure you include these values in your post, so your fellow classmates can use them to calculate their own confidence intervals.  Once you calculate the confidence intervals you will need to interpret your interval and explain what this means in words.
Do the confidence intervals surprise you, knowing what you have learned about confidence intervals, proportions and normal distribution?  Please the Week 5 Confidence T-Interval Mean and Unknown SD PDF and the Week 5 Confidence Interval Proportions PDF at the bottom of the discussion.  This will give you a step by step example on how to help you calculate this using Excel.

Statistics homework help

 
Write a 2- to 3-page critique of the research you found in the Walden Library that includes responses to the following prompts:

  • Why did the authors select binary logistic regression in the research?
  • Do you think this test was the most appropriate choice? Why or why not?
  • Did the authors display the results in a figure or table?
  • Does the results table stand alone? In other words, are you able to interpret the study from it? Why or why not?