## A Simple Explanation of Bayes Theorem and Bayesian Inference

While studying through the excellent Udacity Data Analysis Nano Degree , I found myself struggling to answer the Quiz questions on Bayes Theorem. To help myself comprehend it, I did a fair bit of studying other resources too and I came to the conclusion that it might be a. helpful to write an article myself to help reinforce this difficult subject b. but also help others.

In this article I will articulate Bayes Theorem in a simple manner, and guide with some examples.

## What Is Bayes Theorem?

Bayes’ Theorem is a widely used theory in statistics and probability, making it a very important theory in the field of data science and data analysis. For example, Bayesian inference, a particular approach to statistical inference where we can determine and adjust the probability for a hypothesis as more data or information becomes available.

## What Are its Applications?

For example, it can be used to determine the likelihood that a finance transaction is fraud related, or in determining the accuracy of a medical test, or the chances of a particular return on stocks and hundreds of other examples for every industry imaginable from Finance to Sport, Medicine to Engineering, Video Games to Music.

## What does it Do?

So as we mentioned – Bayesian inference gives us the probability of an event, given certain evidence or tests.

We must keep a few things in the back of our mind first

• The test for fraud is separate from the result of it being fraud or not.
• Tests are not perfect, and so give us false positives (Tell us the transaction is fraud when it isn’t in reality), and false negatives (Where the test misses fraud that does exist.
• Bayes Theorem turns the results from your tests into the actual probability of the event.
• We start with a prior probability , combine with our evidence which results in out posterior probability

## How Does it Work? An Example

Consider the scenario of tests for cancer as an example.

Where we want to ascertain the probability of a patient having cancer given a particular test result.

• Chances a patient has this type of cancer are 1% , written as P(C) = 1% – the prior probability
• Test result is 90% Positive if you have C – written as P(Pos | C) – the sensitivity (we can take 100%-90% = 10% as the remaining Positive percentage where there is no C but the test misdiagnoses it – the false positives
• Test result is 90% Negative if you do not have C, written as P(Neg | ¬C) – the specificity (we can take 100%-90% = 10% as the percentage of negative results but there is C but the test misses it – the false negatives

Lets plot this in a table so it’s a bit more readable.

 Cancer – 1% Not Cancer – 99% Positive Test 90% 10% Negative Test 10% 90%
• Our Posterior probability is what we’re trying to predict – the chances of Cancer actually being present, given a Positive Test – written as P( C | Pos ) – that is, we take account of the chances of false positives and false negatives
• Posterior P( C | Pos ) = P ( Pos | C) x P( C ) = .9 x .001 = 0.009
• While P( ¬C | Pos) = P ( Pos | ¬C) x P(¬C) = .1 x .99 = 0.099

Lets plot this in our table.

 Cancer – 1% Not Cancer – 99% Positive Test True Pos90% * 1% = 0.009 False Pos10% * 99% = 0.099 Negative Test False Neg10% * 1% = 0.001 True Neg90% * 99% = 0.891

But of course that’s not the complete story – We need to account for the number of ways it could happen given all possible outcomes

The chance of getting a real, positive result is .009. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (0.009 + 0.099 = 0.108).

So, our actual posterior probability of cancer given a positive test is .009/.108 = 0.0883, or about 8.3%.

In Bayes Theorem terms, this is written as follows, where c is the chance a patent has cancer, and x is the positive result

• P(c|x) = Chance of having cancer (c) given a positive test (x). This is what we want to know: How likely is it to have cancer with a positive result? In our case it was 8.3%.
• P(x|c) = Chance of a positive test (x) given that you had cancer (c). This is the chance of a true positive, 90% in our case.
• P(c) = Chance of having cancer (1%).
• P(¬ c) = Chance of not having cancer (99%).
• P(x|¬ c) = Chance of a positive test (x) given that you didn’t have cancer (¬ c). This is a false positive, 9.9% in our case.

## Data is the New Frontier

The demand for strong data skills is sky rocketing. With the rapid growth in Big Data, tools and technologies maturing and cloud computing coming to fruition, we’re amidst what some are calling the fourth industrial revolution and the Information Age.

Every major organisation is making use of their data and analytics to new opportunities and cost savings. If they are not, expect them to be left behind. As things stand, 77% of top organisations consider data analytics a critical component of business performance.

## Why You Should Learn and Grow Data Skills

Globally, companies are clamouring for those with strong capabilities in capitalising on data. With that, your data skills can bring you increased job prospects and pay, while also bringing you huge influence, policies and decision making potential in the workplace.

CIOs globally ranked analytics and business intelligence as the most critical technology to achieve the organization’s business goals. Naturally, data and analytics skills are the No. 1 sought-after talent

SARAH HIPPOLD, Gartner ARticle
August 17, 2018

Well I’m not driven by money, and I’m certainly not driven to be the centre of attention! So why do I spend time learning?

• The Data Landscape is changing. Oracle and SQL experience got me into the industry. Today, while Oracle is still a giant, you could argue that they’re behind Big Data technologically speaking. Data is the new business language. What’s more, as we are in the ascent of the data industry, software engineering, artificial intelligence and automation all combining, todays skills requirements are changing rapidly.
The industry is moving and changing at rapid pace – To learn is to stay relevant
• It’s really satisfying growing your skills, not only that but then getting to use them! If you are in the Data industry, this is the perfect change to learn Big Data, Analytics, Visualisation- You can learn and apply those skills almost instantly as companies are crying out for these skills.
• Dictate a career for the next decade – Big Data Engineer, Data Analyst, Machine Learning Developer, Data Scientist, Data Architect, DevOps Scrum Master – There is a genuine industry here that is growing. Roles are diverging into distinct skill sets – you can choose what interests you most and have a successful career. What’s more, data is not tied to any business industry – the skills used in Finance or Banking can be transferred to Video Gaming, Movie Streaming, Logistics, E-Commerce – where ever your interests lie!

# What is CRISP-DM?

CRISP-DM is a process methodology that provides a certain amount of structure for data-mining and analysis projects. It stands for cross-industry process for data mining. According to polls popular Data  Science website KD Nuggets, it is the most widely used process for data-mining.

The process revolves are six major steps:

Start by focusing on understanding the objectives and requirements from a business perspective, and then using this knowledge to define the data problem and  project plan.

2.       Data Understanding

As with every data project, there is an initial hurdle to collect data and to familiarise yourself with it, identify data quality issues, discover initial insights, or to detect interesting nuggets of information that might for a hypothesis for analysis.

3.       Data Preparation

The nitty-gritty dirty work of preparing the data by cleaning it, merging, moulding it etc to form a final dataset that can be used in modeling.

4.       Modeling

At this point we decide which model techniques to actually use and build them

5.       Evaluation

Once we appear to have good enough quality model results, these need to be tested to ensure they test well against unseen data and that all key business issues have been answered.

6.       Deployment

At this stage we are ready to deploy our code representation of the model into an production environment and solve our original business problem.

# Why Use It ?

### Puts the Business Problem First

One of the greatest advantages is that it puts business understanding at the centre of the project. This means we are concentrating on solving the business’s needs first and foremost and trying to deliver value to our stakeholders.

### Commonality of Structured Solutions

As a manager of Data Scientists and Analysts it also ensures that we stick to a common methodology to maintain optimal results that the team can follow, ensuring we have a followed best practice or tackled common issues.

### Flexible

It’s also not a rigid structure – It’s malleable and steps are repeatable, and often you will naturally go back through the steps to optimise your final data set for modeling.

It does not necessarily need to be mining or model related. Since so many of the business problems today require extensive data analysis and preperation, the methodology can flex to suit other categories of solutions like recommender systems, Sentiment analysis, NLP amongst other

# A Template in Jupyter

Since we have a structure process to follow, it is likely we have re-usable steps and components. Ideal for re-usable code. What’s more, a Jupyter notebook can contain the documentation necessary for our business understanding and description of each step in the process.

To that end, I have a re-usable notebook on my DataSci_Resources GitHub Repository here

# Resources

## What is the Pareto Principle

The Pareto Principle (also commonly known as the 80/20 principle), is an observation which states that 80 percent of outputs come from 20 percent of the inputs. It was first observed by the Italian economist Vilfredo Pareto, who observed that 80% of Italy’s wealth, came from 20% of its population. He found that this principle held roughly true in other countries and situations as well.

The Pareto Principle is a neat guide of describing distributions in real-life scenarios that holds true in a vast array of situations. That is, that each input in a scenario, is unequally distributed to the outputs of that situation.

For example;

• A common adage on Computer Science is that 20% of features contribute 80% of usage
• Microsoft also noted that 20% of bugs contribute 80% of crashes. While also finding that 20% of effort contributed 80% of features
• 20% of customers contribute to 80% of income
• 20% of workers contribute 80% of the work

It’s not simply a case of investing the same amount of input and getting an equal value out.

## Why Use the Pareto Principle

I want to propose how valuable this observation is in project management and to consider using this to gain massive return on investment by adhering to it as a principle, beyond understanding the underlying statistic, whether in your own life or in your work.

If we accept that 20% of the effort produces 80% of the results in a project or product; it conversely holds true that 80% of the effort produces only 20% of the results. In investment terms, that is a massive investment of a resource for an increasingly diminishing return on investment (law of diminishing returns) – you wouldn’t want your investment banker running those odds, so why adhere to it in life or in project management?

Instead of investing so much more in terms of effort and resource to ‘complete’ a project or product, we could focus primarily on the efforts that produce the majority of the results and forget the rest, or at least use this to make an informed decision to prioritise  investments on other projects before coming back to ‘complete’ the project.

Considering this, with the 80% of resources saved, we can invest in further projects and products and get 80% return on each of them – huge returns for the same inputs!

## Conclusion

As project managers, it’s our responsibility to find the most efficient way to get projects completed. There is a set of tasks that generate a disproportionate amount of work.

With this in mind, I want you to consciously make a decision on how we allocate resource, and not keep aiming for the perfect final product. You may very well want the perfect product, but the key is that we have a choice.

For Example;

• Create 5 wire-frame prototypes instead of 1 detail one
• Build 5 features with 80% of the functionality rather than 1 perfect one
• Find a solution to 5 bug that solves the issue for 80% of users rather than 1 that resolves it for everyone

That said, if we still need the final product 100% completed, it is about making an informed decision now that will optimise our investments – focus on the 20%’ers first that produce the best bang for our buck, re-prioritising as we see fit, before returning to attain 100%.

“The difference between successful people and very successful people is that very successful people say “no” to almost everything.”

— Warren Buffett

# References

## What is Amazon Web Services and Cloud Technology?

I’ve recently been reading up on Amazon Web Services which is a cloud computing platform hosted by technology giants Amazon. I thought I’d write up and share what I’ve found to both cement my understanding and hopefully teach others at the same time.

## What is a Cloud Platform?

First of all, what exactly is a Cloud Platform? A Cloud Platform, or Cloud Computing, essentially offers everything a normal server or computing architecture would, but securely via the internet. This means raw computing power, database storage, applications, content delivery and other functionality through the internet. Think of it more like a utility that you are renting – in the same way your electricity or gas. Only you are using computing power, whether that is for storage, streaming or other service.

## What is Amazon Web Services?

Amazon Web Services (or AWS for short) is a secure cloud platform offered by technology giants Amazon (you may have heard of them!). AWS offers huge computing power, massive database storage, content delivery and a wide suite of technologies that offer support of a wide range of other functionality that are very easy to scale, grow and keep up to date.

According to Amazon:

“Amazon Web Services (AWS) provides on-demand computing resources and services in the cloud, with pay-as-you-go pricing. For example, you can run a server on AWS that you can log on to, configure, secure, and run just as you would a server that’s sitting in front of you.”

## Why Use a Cloud Platform?

Traditionally, computing platforms for businesses would be locally hosted at the business or off-site at another business owned location. The business physically owns the entire infrastructure and architecture, as well as large recurring cost to run, maintain, service, expand, upgrade and even power that hardware. The difference with Cloud Platform is that the Cloud Host owns the computing platform, and effectively rents it out to anyone who needs it, when they need it meaning that businesses can save cost of running their own platforms.

## The Benefits of Cloud Computing

1. Cost Savings: By hosting data centres and computing on the cloud, businesses can make significant cost savings rather than having these systems locally hosted. This is the cost of physical space, disaster recovery and utility power. What’s more, once on the cloud, cloud computing services are pay-as-you-go. Meaning you only pay for the features and storage capacity that’s used.
2. Security: There is a misconception that it is less secure by not having all your files and data stored locally on site and instead accessing everything from the cloud over the internet. This is counter to the truth; a cloud host’s primary concern is to carefully monitor security and to keep it secure, employing the best tools and intellect. This is significantly more efficient than bespoke in-house security systems, since a business must divide its resources between many aspects of its technology concerns, security being only one. Additionally, a high percentage of data thefts occur are actually perpetrated by its own employees, therefore it can actually be much safer to keep sensitive information off-site where access is logged and locked behind security.
3. Agility & Flexibility: Cloud computing is made remarkably easing for organisations. After all, making it easy is in the interest of the Cloud Host. Whenever the business needs to change anything to do with its architecture, a cloud-based service can be changed instantly. So much quicker than undergoing an expensive and often complex change to your existing infrastructure. What’s more is that Cloud Hosts are able to offer a massive breadth of different systems, tools and can support many more through open source and third party. All your needs are  simply through a click of a button – as and when you need it, or scaled up and down automatically based entirely on your usage.

## How is Cloud Computing Changing Data?

Data is valuable. When you think about it, information or intelligence, has always held value throughout history. Census information has been collected for centuries for more efficient taxation, farm yields for feeding population through winter, army troop counts, movements & equipment for waging war. Now, we call it data – and every single piece of information, intelligence or data holds value. From the millions of bits of information that surround every single action you, your business or your customer takes are nuggets of invaluable, actionable information just waiting to be identified and acted upon.

What has changed through time is the volume of data we can gather and store. With cloud computing, we can truly have Big Data, and have the storage capacity to collect every nugget of data we can and make it easy to analyse it for insight using analysis tools provided by the Cloud Host. Through these insights, a business can increase efficiencies and better understand their user or customer.

## Some Technological Products of AWS

A handful of technologies that might interest a Data Engineer or Data Scientist:

• Amazon RDS – Managed Relational DAtabase Service for MySQL, PostreSQL, Oracle, SQL Server and MariaDB
• Amazon Redshift – Fast, Simple, Cost-effective Data Warehousing
• Amazon ElastiCache – In-memory Caching System
• Amazon EMR – Hosted Hadoop Framework
• Amazon Kinesis – Work with Real-time Streaming Data
• AWS Glue – Prepare and load data
• Amazon Quicksight – Fast Business Analytics Service
• Amazon SageMaker – Build, train, and deploy Machine Learning at Scale
• Amazon Comprehend – discover insights and relationships in text
• Amazon Lex – Build voice and Text chatbots

And many, many more

## Some Case Studies Using AWS

• Airbnb “Airbnb believes that AWS saved it the expense of at least one operations position. Additionally, the company states that the flexibility and responsiveness of AWS is helping it to prepare for more growth”
• Epic Games – “Creator of Fortnite, the multiplayer battle royale game that has become a global phenomenon, relies on AWS for its expansive infrastructure, unmatched reliability, and global scale”
• Netflix “AWS enables Netflix to quickly deploy thousands of servers and terabytes of storage within minutes. Users can stream Netflix shows and movies from anywhere in the world, including on the web, on tablets, or on mobile devices such as iPhones.”
• Pinterest –  “By using AWS, the company can maintain developer velocity and site scalability, manage multiple petabytes of data each day, and perform daily refreshes of its massive search index.”
• Expedia“By using AWS, Expedia has become more resilient. Expedia’s developers have been able to innovate faster while saving the company millions of dollars. Expedia provides travel-booking services across its flagship site Expedia.com and about 200 other travel-booking sites around the world.”