The project is part of the Udacity Data Analysis Nanodegree. The section of the course is a Project where we perform our own data analysis to determine whether a web-site should change their page design from and old page to a new page, based on the results of an AB test on a subset of users.
The Project aims to bring together several concepts taught to us over the duration of the course, which we can apply to the data set which will allow us to analyse the data and determine probabilities of a user converting or not using various statistical methods based on whether the user used old page or new page
The PDF report written to communicate my project and findings can also be found here
What We Learned
- Using proportions to find probability.
- How to write Hypothesis statements and using these to Test against.
- Writing out Hypotheses and observation in accurate terminology
- Using statsmodel to simulate 10000 examples from a sample dataset, and finding differences from the mean
- Plotting differences from the mean in a plt.Hist Histogram, and adding a representation line for the actual observed difference
- Using Logistic Regression to determine probabilities for one of two possible outcomes
- Creating Dummy variables for making categorical variables usable in regression
- Creating interaction variables to better represent attributes in combination for use in regression
- Interpreting regression summary() results and accurately concluding and making observations from results
The Code and the Report
- GitHub repository for the data, PDF report and Jupyter Notebook
- the PDF report can also be found here