The Project
The project is part of the Udacity Data Analysis Nanodegree. The section of the course is a Case Study on wine quality, using the UCI Wine Quality Data Set: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
The Case Study introduces us to several new concepts which we can apply to the data set which will allow us to analyse several attributes and ascertain what qualities of wine correspond to highly rated wines.
I downloaded the data from the above link. I then imported the data into Python so we could use a Jupyter Notebook to create the required report, which allows us to document and code in the same document, great for presenting back findings and visualisations from the data.
I structured the project similarly to the CRISP-DM method – that is I i. Stated the objectives, ii. Decided what questions to ask of the data, iii. Carried out tasks to understand the data, iv. Performed Data Wrangling and Exploratory Data Analysis and then drew conclusions and answered the questions posed.
The PDF report written to communicate my project and findings can also be found here
What We Learned
- Using Hist and plot() to build Histograms visualisations
- Using plotting.scatter_matrix and plot() to build scatter plot visualisations
- Changing the figsize of a chart to a more readable format, and adding a ‘;’ to the end of the line to remove unwanted text
- Appending data frames together in Pandas
- Renaming data frame Columns in Pandas
- Using GroupBy and Query in Pandas to aggregate and group selections of data
- Creating Bar charts in matplotlib and using Seaborn to add better formating
- Adding appropriate labels, titles , colour
- Engineering proportionality in the data that allows data sets be compared more easily
The Code and the Report
- GitHub repository for the data, SQL, PDF report and Jupyter Notebook
- the PDF report can also be found here
References
- Image: https://www.pexels.com/photo/restaurant-bar-wine-red-wine-87224/
- UCI Wine Quality Data Set: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
- UDACITY Data Analyst Nanodegree: https://eu.udacity.com/course/data-analyst-nanodegree–nd002?v=a Image: https://www.pexels.com/photo/restaurant-bar-wine-red-wine-87224/
- Plot Documentation: https://matplotlib.org/users/pyplot_tutorial.html