Swapnil Saurav

Monte Carlo Simulation

Monte Carlo simulation is a computerized mathematical technique to generate random sample data based on some known distribution for numerical experiments. This method is applied to risk quantitative analysis and decision making problems. This method is used by the professionals of various profiles such as finance, project management, energy, manufacturing, engineering, research & development, insurance, oil & gas, transportation, etc.

This method was first used by scientists working on the atom bomb in 1940. This method can be used in those situations where we need to make an estimate and uncertain decisions such as weather forecast predictions.

The Monte Carlo Simulation Formula

We would like to accurately estimate the probabilities of uncertain events. For example, what is the probability that a new product’s cash flows will have a positive net present value (NPV)? What is the risk factor of our investment portfolio? Monte Carlo simulation enables us to model situations that present uncertainty and then play them out on a computer thousands of times.

Many companies use Monte Carlo simulation as an important part of their decision-making process. Here are some examples.

  • General Motors, Proctor and Gamble, Pfizer, Bristol-Myers Squibb, and Eli Lilly use simulation to estimate both the average return and the risk factor of new products. At GM, this information is used by the CEO to determine which products come to market.
  • GM uses simulation for activities such as forecasting net income for the corporation, predicting structural and purchasing costs, and determining its susceptibility to different kinds of risk (such as interest rate changes and exchange rate fluctuations).
  • Lilly uses simulation to determine the optimal plant capacity for each drug.
  • Proctor and Gamble uses simulation to model and optimally hedge foreign exchange risk.
  • Sears uses simulation to determine how many units of each product line should be ordered from suppliers—for example, the number of pairs of Dockers trousers that should be ordered this year.
  • Oil and drug companies use simulation to value “real options,” such as the value of an option to expand, contract, or postpone a project.
  • Financial planners use Monte Carlo simulation to determine optimal investment strategies for their clients’ retirement.

Download Excel:

Exponential Smoothing Forecasting – Examples

Example 1:

Exponential Smoothing Forecasting – Example

Let’s consider α=0.2 for the above-given data values so enter the value 0.8 in the Damping Factor box and again repeat the Exponential

The result is shown below:

Exponential Smoothing Forecasting – Example #2

Let’s consider α=0.8 for the above-given data values so enter the value 0.2 in the Damping Factor box and again repeat the Exponential Smoothing method.

The result is shown below:

Now, if we compare the results of all the above 3 Excel Exponential Smoothing examples, then we can come up with the below conclusion:

  • The Alpha α value is smaller; the damping factor is higher. Resultant the more the peaks and valleys are smoothed out.
  • The Alpha α value is higher; the damping factor is smaller. Resultant the smoothed values are closer to the actual data points.

Things to Remember

  • The more value of the dumping factor smooths out the peak and valleys in the dataset.
  • Excel Exponential Smoothing is a very flexible method to use and easy in the calculation.
Multiple Linear Regression Code

Multiple linear regression (MLR) is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The goal of multiple linear regression (MLR) is to model the relationship between the explanatory and response variables.

The model for MLR, given n observations, is:

Let’s take an example:

The dataset has 5 columns which contains extract from the Profit and Loss statement of 50 start up companies. This tells about the companies R&D, Admin and Marketing spend, the state in which these companies are based and also profit that the companies realized in that year. A venture capitalist (VC) would be interested in such a data and would to see if factors like R&D Spend, Admin expenses, Marketing spend and State has any role to play on the profitability of a startup. This analysis would help VC to make investment decisions in future.

Profit is the dependent variable and other variables are independent variables.

Dummy Variables

Let’s look at the dataset we have for this example:

One challenge we would face while building the linear model is on handling the State variable. State column has a categorical value and can not be treated as like any other numeric value. We need to add dummy variables for each categorical value like below:

Add 3 columns for each categorical value of state. Add 1 to the column where row value of state matches to the column header. Row containing New York will have 1 against the column header New York and rest of the values in that column will be zero. Similarly, we need to modify California and Florida columns too. Three additional columns that we added are called dummy variables and these will be used in our model building. State column can be ignored. We can also ignore New York column from analysis because row which has zero under California and Florida implicitly implies New York will have a value of 1. We always use 1 less dummy variable compared to total factors to avoid dummy variable trap.

Python code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#Dataset
data_df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”)
#Getting X and Y values
X = data_df.iloc[:, :-1].values
y = data_df.iloc[:, -1].values

#Encoding the categorical variables:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
#Change the text into numbers 0,1,2 – 4th column
X[: ,3]= labelencoder_X.fit_transform(X[: ,3])
#create dummy variables
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(), [3])],remainder=‘passthrough’)
#Now a little fit and transform
X = np.array(transformer.fit_transform(X), dtype=np.float)
#4 Avoid the dummy variables trap
#Delete the first column represent the New York
X= X[:, 1:]

#Split into training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#Train the Algorithm
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
#The y_pred is a numpy array that contains all predicted values
#compare actual output values for X_test with predicted values
output_df = pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(“Actual v Predicted: \n,output_df)
####
import numpy as np
from sklearn import metrics
explained_variance=metrics.explained_variance_score(y_test, y_pred)
mean_absolute_error=metrics.mean_absolute_error(y_test, y_pred)
mse=metrics.mean_squared_error(y_test, y_pred)
mean_squared_log_error=metrics.mean_squared_log_error(y_test, y_pred)
median_absolute_error=metrics.median_absolute_error(y_test, y_pred)
r2=metrics.r2_score(y_test, y_pred)
print(‘Explained_variance: ‘, round(explained_variance,2))
print(‘Mean_Squared_Log_Error: ‘, round(mean_squared_log_error,2))
print(‘R-squared: ‘, round(r2,4))
print(‘Mean Absolute Error(MAE): ‘, round(mean_absolute_error,2))
print(‘Mean Squared Error (MSE): ‘, round(mse,2))
print(‘Root Mean Squared Error (RMSE): ‘, round(np.sqrt(mse),2))
from statsmodels.api import OLS
import statsmodels.api as sm
#In our model, y will be dependent on 2 values: coefficienct
# and constant, so we need to add additional column in X for
#constant value
X = sm.add_constant(X)
summ = OLS(y, X).fit().summary()
print(“Summary of the dataset: \n,summ)

Output:

In above table, x1 and x2 are the dummy variables for state, x3 is R&D, x4 is Administration, x5 is the marketing spends.

How many independent variables to consider?

We need to be careful to choose which ones we need to keep for input variables. We do not want to include all the variables for mainly 2 reasons:

  1. GIGO: If we feed garbage to our model we will get garbage out so we need to feed in right set of data
  2. Justifying the input: Can we justify the inclusion of all the data, if no, then we should not include them.

There are 4 methods to build a multiple linear model:

  1. Select all in
  2. Backward Elimination
  3. Forward Selection
  4. Bidirectional Elimination

Select-all-in: We select all the independent variables because we know that all variables impact the result or you have to because business leaders want you to include them.

Backward Elimination:

  1. Select a significance level to stay in the model (e..g. SL =0.05, higher P value to be removed)
  2. Fit the full model with all possible predictors.
  3. Consider the predictor with the highest P-value. If P>SL, go to step 4 otherwise goto 5
  4. Remove the predictor and refit the model and Go to step 3
  5. Your model is ready!

Forward Selection:

  1. Select a significance level to stay in the model (e..g. SL =0.05, lower P value to be kept)
  2. Fit all the simple regression models, Select the one with the lowest P-value.
  3. Keep this variable and fit all possible models with one extra predictor added to the ones you already have. Now Run with 2 variable linear regressions.
  4. Consider the predictor with the lowest P-value. If P<SL, go to Step 3, otherwise go to next step.
  5. Keep the previous model!

Bi-directional Selection: It is a combination of Forward selection and backward elimination:

  1. Select a significant level to enter and stay in the model (SLE = SLS = 0.05)
  2. Perform the next step of Forward selection (new variables must have P<SLE)
  3. Perform all the step of Backward elimination (old variables must have P<SLS)
  4. Iterate between 2 & 3 till no new variables can enter and no old variables can exit.

In the multiple regression example since we have already executed with all the attributes, let’s implement backward elimination method here and remoe out the attributes that are not useful for us. Let’ have a relook at the stats summary:

Look at the highest p-values and remove it. In this condition x2 (second  dummy variable has the highest one (0,990). Now, we will remove this variable from the X and re-run the model.

X_opt= X[:, [0,1,3,4,5]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ =regressor_OLS.summary()
print(“Summary of the dataset after elimination 1: \n,summ)

Output Snapshot:

Look at the highest p-value again. #First dummy variable, x1’s p-value is 0.940. Remove this one. Even though this appeared as high number in the previous step also, but as per the algorithm we need to remove only 1 value at a time. Since, removing an attribute can have impact on other attributes also. Re-run the code again:

X_opt= X[:, [0,3,4,5]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ = regressor_OLS.summary()
print(“Summary of the dataset after elimination 2: \n,summ)

Admin spends (x2) has the highest p-value (0.602). Remove this as well.

X_opt= X[:, [0,3,5]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ = regressor_OLS.summary()
print(“Summary of the dataset after elimination 3: \n,summ)

Admin spends (x2) has the highest p-value (0.06). This value is low but since we have selected the significance level (SL) as 0.05, we need to remove this as well.

X_opt= X[:, [0,3]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ =regressor_OLS.summary()
print(“Summary of the dataset after elimination 3: \n,summ)

Finally, we see that only one factor has the significant impact on the profit. The highest impact variable is R&D spendings on profit of these startups. The accuracy of the model has also increased. When we included all the attributes, the R squared value was 0.9347 and now its at 0.947.

The word “linear” in “multiple linear regression” refers to the fact that the model meets all the criteria discussed in the next section.

Test for Linearity

The next question we need to understand is when can we perform or not perform Linear Regression. In this section, let’s understand the assumptions of linear regression in detail. One of the most essential steps to take before applying linear regression and depending solely on accuracy scores is to check for these assumptions and only when a dataset meet these assumptions, we say that dataset can be used for linear regression model.

For the analysis, we will take the same dataset, we used for Multiple Linear Regression Analysis in the previous section.

import pandas as pd
#Dataset
data_df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”)

Before we apply regression on all these attributes, we need to understand if we need to really take all of these attributes into consideration. There are two things we need to consider:

First step is to test the dataset if its fits into the linearity definition, which we will perform different tests in this section. Remember, we only test for numerical columns as the categorical columns are not taken into account. As we know that the categorical values are converted into dummy variables of values 0 and 1, dummy variables meet the assumption of linearity by definition, because they creat two data points, and two points define a straight line. There is no such thing as a non-linear relationship for a single variable with only two values.

Code for Prediction: Let’s rewrite the code

import numpy as np
X = data_df.iloc[:,:-1].values
y = data_df.iloc[:,-1].values

#handling categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
le_x = LabelEncoder()
X[:,3] = le_x.fit_transform(X[:,3])
from sklearn.compose import ColumnTransformer
tranformer = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[3])], remainder=‘passthrough’)
X = np.array(tranformer.fit_transform(X), dtype=np.float)
X=X[:,1:]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train , y_train)
y_pred = regressor.predict(X_test)
output_df = pd.DataFrame({“Actual”:y_test, “Predicted”: y_pred})

Let’s perform the tests now:

1. Linearity

Linear regression needs the relationship between the independent and dependent variables to be linear. Let’s use a pair plot to check the relation of independent variables with the profit variable.

Output:

Python Code:

import seaborn as sns
import matplotlib.pyplot as plt

# visualize the relationship between the features and the response using scatterplots
p = sns.pairplot(data_df, x_vars=[‘R&D Spend’,‘Administration’,‘Marketing Spend’], y_vars=‘Profit’, height=5, aspect=0.7)
plt.show()

By looking at the plots we can see that with the R&D Spend form an accurately linear shape and Marketing Spend is somewhat in the linear shape but Administration Spend is all over the graph but still shows increasing trend as Profit value increases on Y-Axis. Here we can use Linear Regression models.

2. Variables follow a Normal Distribution

The variables (X) follow a normal distribution. In order words, we want to make sure that for each x value, y is a random variable following a normal distribution and its mean lies on the regression line. One of the ways to visually test for this assumption is through the use of the Q-Q-Plot. Q-Q stands for Quantile-Quantile plot and is a technique to compare two probability distributions in a visual manner. To generate this Q-Q plot we will be using scipy’s probplot function where we compare a variable of our chosen to a normal probability.

import scipy.stats as stats
stats.probplot(X[:,3], dist=“norm”, plot=plt)
plt.show()

The points must lie on this red line to conclude that it follows a normal distribution. In this case of selecting 3rd column which is R&D Spend, yes it does! A couple of points outside of the line is due to our small sample size. In practice, you decide how strict you want to be as it is a visual test.

3. There is no or little multicollinearity

Multicollinearity means that the independent variables are highly correlated with each other. X’s are called independent variables for a reason. If multicollinearity exists between them, they are no longer independent and this generates issues when modeling linear regressions.

To visually test for multicollinearity we can use the power of Pandas. We will use Pandas corr function to compute the pairwise correlation of our columns. If you find any values in which the absolute value of their correlation is >=0.8, the multicollinearity assumption is being broken.

#convert to a pandas dataframe
import pandas as pd
df = pd.DataFrame(X)
df.columns = [‘x1’,‘x2’,‘x3’,‘x4’,‘x5’]
#generate correlation matrix
corr = df.corr() #Plot HeatMap
p=sns.heatmap(df.corr(), annot=True,cmap=‘RdYlGn’,square=True)
print(“Corelation Matrix:\n,corr)

4. Check for Homoscedasticity: The data are needs to be homoscedastic (meaning the residuals are equal across the regression line). Homoscedasticity means that the residuals have equal or almost equal variance across the regression line. By plotting the error terms with predicted terms we can check that there should not be any pattern in the error terms.

#produce regression plots
from statsmodels.api import OLS
import statsmodels.api as sm
X = sm.add_constant(X)
model = OLS(y, X).fit()
summ = model.summary()
print(“Summary of the dataset: \n,summ)
fig = plt.figure(figsize=(12,8))
#Checking for x3 (R&D Spend)
fig = sm.graphics.plot_regress_exog(model, ‘x3’, fig=fig)
plt.show()

Four plots are produced. The one in the top right corner is the residual vs. fitted plot. The x-axis on this plot shows the actual values for the predictor variable points and the y-axis shows the residual for that value. Since the residuals appear to be randomly scattered around zero, this is an indication that heteroscedasticity is not a problem with the predictor variable x3 (R&D Spend). Multiple Regression, we need to create this plot for each of the predictor variable.

5. Mean of Residuals

Residuals as we know are the differences between the true value and the predicted value. One of the assumptions of linear regression is that the mean of the residuals should be zero. So let’s find out.

residuals = y_test-y_pred
mean_residuals = np.mean(residuals)
print(“Mean of Residuals {}”.format(mean_residuals))

Output:

Mean of Residuals 3952.010244810798

6. Check for Normality of error terms/residuals

p = sns.distplot(residuals,kde=True)
p = plt.title(‘Normality of error terms/residuals’)
plt.show()

The residual terms are pretty much normally distributed for the number of test points we took.

Restricted Boltzmann Machine and Its Application

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by “stacking” RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation. This deep learning algorithm became very popular after the Netflix Competition where RBM was used as a collaborative filtering technique to predict user ratings for movies and beat most of its competition. It is useful for regression, classification, dimensionality reduction, feature learning, topic modelling and collaborative filtering.

Restricted Boltzmann Machines are stochastic two layered neural networks which belong to a category of energy based models that can detect inherent patterns automatically in the data by reconstructing input. They have two layers visible and hidden. Visible layer has input nodes (nodes which receive input data) and the hidden layer is formed by nodes which extract feature information from the data and the output at the hidden layer is a weighted sum of input layers. They don’t have any output nodes and they don’t have typical binary output through which patterns are learnt. The learning process happens without that capability which makes them different. We only take care of input nodes and don’t worry about hidden nodes. Once the input is provided, RBM’s automatically capture all the patterns, parameters and correlation among the data.

What is Boltzman Machine?

Let’s first undertand what’s Boltzman Machine. Boltzmann Machine was first invented in 1985 by Geoffrey Hinton, a professor at the University of Toronto. He is a leading figure in the deep learning community and is referred to by some as the “Godfather of Deep Learning”.

Boltzman Machine
  • Boltzmann Machine is a generative unsupervised model, which involves learning a probability distribution from an original dataset and using it to make inferences about never before seen data.
  • Boltzmann Machine has an input layer (also referred to as the visible layer) and one or several hidden layers (also referred to as the hidden layer).
  • Boltzmann Machine uses neural networks with neurons that are connected not only to other neurons in other layers but also to neurons within the same layer.
  • Everything is connected to everything. Connections are bidirectional, visible neurons connected to each other and hidden neurons also connected to each other
  • Boltzmann Machine doesn’t expect input data, it generates data. Neurons generate information regardless they are hidden or visible.
  • For Boltzmann Machine all neurons are the same, it doesn’t discriminate between hidden and visible neurons. For Boltzmann Machine whole things are system and its generating state of the system.

In Boltzmann Machine, we use our training data and feed into the Boltzmann Machine as input to help the system adjust its weights. It resembles our system not any such system in the world. It learns from the input, what are the possible connections between all these parameters, how do they influence each other and therefore it becomes a machine that represents our system. Boltzmann Machine consists of a neural network with an input layer and one or several hidden layers. The neurons in the neural network make stochastic decisions about whether to turn on or off based on the data we feed during training and the cost function the Boltzmann Machine is trying to minimize. By doing so, the Boltzmann Machine discovers interesting features about the data, which help model the complex underlying relationships and patterns present in the data.

This Boltzmann Machine uses neural networks with neurons that are connected not only to other neurons in other layers but also to neurons within the same layer. That makes training an unrestricted Boltzmann machine very inefficient and Boltzmann Machine had very little commercial success.
Boltzmann Machines are primarily divided into two categories: Energy-based Models (EBMs) and Restricted Boltzmann Machines (RBM). When these RBMs are stacked on top of each other, they are known as Deep Belief Networks (DBN). Our focus of discussion here is the RBM.

Restricted Boltzmann Machines (RBM)

Restricted Boltzman Machine Algorithm
  • What makes RBMs different from Boltzmann machines is that visible node isn’t connected to each other, and hidden nodes aren’t connected with each other. Other than that, RBMs are exactly the same as Boltzmann machines.
  • It is a probabilistic, unsupervised, generative deep machine learning algorithm.
  • RBM’s objective is to find the joint probability distribution that maximizes the log-likelihood function.
  • RBM is undirected and has only two layers, Input layer, and hidden layer
  • All visible nodes are connected to all the hidden nodes. RBM has two layers, visible layer or input layer and hidden layer so it is also called an asymmetrical bipartite graph.
  • No intralayer connection exists between the visible nodes. There is also no intralayer connection between the hidden nodes. There are connections only between input and hidden nodes.
  • The original Boltzmann machine had connections between all the nodes. Since RBM restricts the intralayer connection, it is called a Restricted Boltzmann Machine.
  • Since RBMs are undirected, they don’t adjust their weights through gradient descent and backpropagation. They adjust their weights through a process called contrastive divergence. At the start of this process, weights for the visible nodes are randomly generated and used to generate the hidden nodes. These hidden nodes then use the same weights to reconstruct visible nodes. The weights used to reconstruct the visible nodes are the same throughout. However, the generated nodes are not the same because they aren’t connected to each other.

Simple Understanding of RBM

Problem Statement: Let’s take an example of a small café just across a street where people come in the evening to hang out. We see that normally three people: Geeta, Meeta and Paavit visit frequently. Not always all of them show up together. We have all the possible combinations of these three people showing up. It could be just Geeta, Meeta or Paavit show up or Geeta and Meeta come at the same time or Paavit and Meeta or Paavit and Geeta or all three of them show up or none of them show up on some days. All the possibilities are valid.

Left to Right: Geeta, Meeta, Paavit

Let’s say, you watch them coming everyday and make a note of it. Let’s take first day, Meeta and Geeta comes and Paavit didn’t. Second day, Paavit comes but Geeta and Meeta doesn’t. After noticing for 15 days, you find that only these two possibilities are repeated. As represented in the table.

Visits of Geeta, Meeta and Paavit to the cafe

That’s an interesting finding and more so when we come to know that these three people are totally unknown to each other. You also find out that there are two café managers: Ratish and Satish. Lets tabulate it again with 5 people now (3 visitors and 2 managers).

Visits of customer and presence of manager on duty

We find that, Geeta and Meeta likes Ratish so they show up when Ratish is on duty. Paavit likes Satish so he shows up only when Satish is on duty. So, we look at the data we might say that Geeta and Meeta went to the café on the days Ratish is on duty and Paavit went when Satish is on duty. Lets add some weights.

Customers and Managers relation with weights

Since we see that customers in our dataset, we call them as visible layer. Managers are not shown in the dataset, we call it as hidden layer. This is an example of Restricted Boltzmann Machine (RBM).

RBM

(… to be continued…)

Working of RBM

RBM is a Stochastic Neural Network which means that each neuron will have some random behavior when activated. There are two other layers of bias units (hidden bias and visible bias) in an RBM. This is what makes RBMs different from autoencoders. The hidden bias RBM produces the activation on the forward pass and the visible bias helps RBM to reconstruct the input during a backward pass. The reconstructed input is always different from the actual input as there are no connections among the visible units and therefore, no way of transferring information among themselves.

Step 1

The above image shows the first step in training an RBM with multiple inputs. The inputs are multiplied by the weights and then added to the bias. The result is then passed through a sigmoid activation function and the output determines if the hidden state gets activated or not. Weights will be a matrix with the number of input nodes as the number of rows and the number of hidden nodes as the number of columns. The first hidden node will receive the vector multiplication of the inputs multiplied by the first column of weights before the corresponding bias term is added to it.

Here is the formula of the Sigmoid function shown in the picture:

So the equation that we get in this step would be,

where h(1) and v(0) are the corresponding vectors (column matrices) for the hidden and the visible layers with the superscript as the iteration v(0) means the input that we provide to the network) and a is the hidden layer bias vector.

(Note that we are dealing with vectors and matrices here and not one-dimensional values.)

Now this image shows the reverse phase or the reconstruction phase. It is similar to the first pass but in the opposite direction. The equation comes out to be:

where v(1) and h(1) are the corresponding vectors (column matrices) for the visible and the hidden layers with the superscript as the iteration and b is the visible layer bias vector.

Now, the difference v(0)−v(1) can be considered as the reconstruction error that we need to reduce in subsequent steps of the training process. So the weights are adjusted in each iteration so as to minimize this error and this is what the learning process essentially is.

In the forward pass, we are calculating the probability of output h(1) given the input v(0) and the weights W denoted by:

And in the backward pass, while reconstructing the input, we are calculating the probability of output v(1) given the input h(1) and the weights W denoted by:

The weights used in both the forward and the backward pass are the same. Together, these two conditional probabilities lead us to the joint distribution of inputs and the activations:

Reconstruction is different from regression or classification in that it estimates the probability distribution of the original input instead of associating a continuous/discrete value to an input example. This means it is trying to guess multiple values at the same time. This is known as generative learning as opposed to discriminative learning that happens in a classification problem (mapping input to labels).

Let us try to see how the algorithm reduces loss or simply put, how it reduces the error at each step. Assume that we have two normal distributions, one from the input data (denoted by p(x)) and one from the reconstructed input approximation (denoted by q(x)). The difference between these two distributions is our error in the graphical sense and our goal is to minimize it, i.e., bring the graphs as close as possible. This idea is represented by a term called the Kullback–Leibler divergence.

KL-divergence measures the non-overlapping areas under the two graphs and the RBM’s optimization algorithm tries to minimize this difference by changing the weights so that the reconstruction closely resembles the input. The graphs on the right-hand side show the integration of the difference in the areas of the curves on the left.

This gives us intuition about our error term. Now, to see how actually this is done for RBMs, we will have to dive into how the loss is being computed. All common training algorithms for RBMs approximate the log-likelihood gradient given some data and perform gradient ascent on these approximations.

Contrastive Divergence

Here is the pseudo-code for the CD algorithm:

CD Algorithm pseudo code

Applications:
* Pattern recognition : RBM is used for feature extraction in pattern recognition problems where the challenge is to understand the hand written text or a random pattern.
* Recommendation Engines : RBM is widely used for collaborating filtering techniques where it is used to predict what should be recommended to the end user so that the user enjoys using a particular application or platform. For example : Movie Recommendation, Book Recommendation
* Radar Target Recognition : Here, RBM is used to detect intra pulse in Radar systems which have very low SNR and high noise.

Source: wikipedia (https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine)

Understanding the term Data Science?

Another article talking about the basic concept of machine learning and data science.

As the world has entered the time period of big data, so did the demand for data containers. Until 2010, it was the main threat and consideration for the corporate businesses. The primary focus was on developing a framework and data storage systems.
When is it going to happen? The secret sauce here is data science. Data Science can make all of the ideas that you see in Hollywood sci-fi movies a reality. The destiny of Artificial Intelligence is Data Science. As a result, it is essential to recognize what Data Science is and how it might benefit your company?


What is Data Science?
Data Science is a collection of tools, techniques, and deep learning fundamentals that aim to uncover hidden styles in original data. But how does this differ from what statistical methods have done for years? As shown in the preceding image, a Data Analyst typically explains what is happening on by tracing the data’s handling background.
A Data Analyst, on the other hand, not only performs exploratory analysis to glean insights from it, but also employs a variety of advanced machine learning techniques to predict the occurrence of a specific event in the future. A Data Scientist will examine the data from a variety of perspectives, including some that were previously unknown.


Do you need a data science certificate?

A certification on your resume is unlikely to help you land a job. Employers are interested in the skills you possess. A registration, by itself, tells an employer nothing about your abilities. It simply informs them that you researched a subject. Certifications, on the other hand, can be extremely valuable if they effectively teach you the skills you require.
Certification programmes and platforms can still be a great investment, but please remember that their value is in the skills they can instruct you. Employers will look at your skills, project portfolio, and transferable skills when they review your resume. A certificate is unlikely to sway their decision, so focus on developing the necessary skills and creating exciting projects.


Why is data science important?
Data science is essential in almost all aspects of business and techniques. For example, it offers data about consumers that enables businesses to create more effective marketing plans and targeted marketing in order to increase product sales. It aids in the management of financial risks, the detection of fraudulent purchases, and the prevention of equipment failure in production facilities and other industrial sites.
It aids in the prevention of cyber-attacks and other security threats in IT systems. Data science initiatives can improve operational management in the supply chains, product inventory levels, distribution channels, and customer service. On a more basic level, they point the way toward greater efficiency and lower costs.
Data science also allows enterprises to develop strategic initiatives based on an in-depth analysis of customer behavior, market trends, and competition. Without it, business owners risk missing out on possibilities and making bad choices.


Challenges in data science:
Due to the obvious advanced essence of the data analysis involved, data science is especially challenging. The massive amounts of data that are typically analyzed contribute to the complexity and lengthen the time it would take to execute tasks. Furthermore, data scientists regularly work with pools of big data that may encompass a mix of structured, unstructured, and semi-structured data, confounding the analytics platform even further.
Removing bias in data sets and advanced analytics is one of the most difficult challenges. This includes both problems with the original data and problems that data scientists unconsciously build into algorithms and prescriptive models. If such biases are not identified, they can skew analytics results, resulting in flawed findings and poor business decisions. Worse, they can have a negative impact on specific groups of people, as in the particular instance of ethnic partiality in AI technologies.


Why businesses need Data Science?
We’ve progressed from working in small frames of structured data to huge mining areas of unorganized and semi-structured info coming in from a variety of sources. When it tends to come to having to process this huge pool of unstructured information, conventional Business Intelligence tools fall short.
As a result, Data Science includes more sophisticated tools for working with large volumes of data from various sources such as economic logs, multimedia content, advertising forms, detectors and tools, and text files.


What does a data scientist really do?
Algorithms are created and used by data scientists to analyses data. In general, this process entails using and developing machine learning tools and personalized data goods to interest of business and clients in interpreting data in a useful manner.
They also aid in the breakdown of data-driven reports in order to gain a better understanding of the clients. Overall, data scientists are involved in every stage of data processing, from processing it to creating and maintaining facilities, testing, and analyzing it for real-world applications.

All these are more information are available in the book “Data Science and Machine Learning with Python” by Swapnil Saurav. Available on Amazon here

You can find more articles by the author on the topic here.

What is artificial intelligence?

In continuation to our earlier discussions, and talking about basic terms first before we get into higher level of discussions, in this article I talk about what is AI from my perspective.

Artificial intelligence (AI) is a subfield of computer science concerned with the simulated world of human intelligence in machines. Another way to think about AI is as a quest to create machines that can perform specific tasks that require human intelligence. AI has the potential to free us from monotonous tasks, make quick and accurate decisions, act as a catalyst for accelerating discoveries and inventions, and even complete dangerous processes in extreme environments. There is no magic in this place.
It’s a group of intelligent algorithms attempting to mimic human intellect. AI employs techniques such as machine learning and deep learning to learn from data and improve on a regular basis. And AI is more than just a subfield of computer science. Rather, it incorporates elements of statistics, mathematics, intelligent systems, neuroscience, cybernetics, psychology, philology, philosophy, economy, and other disciplines.


Different types of AI:
At a very high level, artificial intelligence can be split into two broad types:

Narrow AI:
Narrow AI is what we see everywhere us in computing systems. It refers to expert machines that have been taught and learned how to perform specific tasks without being pattern recognition to do so. This class of machine intellect is visible in the Siri digital assistant on the Apple iPhone’s communication – language acknowledgment, visual acuity systems on ego cars, and recommendation engines that accurate recommendations you might like predicated on what you’ve purchased in the history. These processes, unlike humans, can only learn or be taught how to perform specific tasks, which is why they are referred to as narrow AI.

General AI:
General AI is very unique and is the a kind of flexible intellectual ability found in people, a flexible form of intelligence smart enough to learn how to hold out massively various tasks, such as hairdressing, building spreadsheet applications, or reasoning about a wide range of topics based on its collective experience. This is the type of AI seen in films, such as HAL in 2001 or Skynet in The Terminator, but it does not exist today – and AI specialists are divided on the how shortly it would become a actuality.

How Artificial Intelligence (AI) Works?
Constructing an AI system is a meticulous process that involves reverse-engineering human characteristics and abilities in a machine and then using its supercomputing prowess to outperform what we are competent of. To comprehend How Artificial Intelligence Works, one must first delve into the various subsites of Ai Technology and comprehend how those domains can be applied to various areas of business. You can also enroll in an artificial intelligence course to develop a full understanding.

What is the Purpose of Artificial Intelligence?
The goal of Artificial Intelligence is to augment human ability and assist us in making complex decisions with far-reaching consequences. From a technical standpoint, that is the answer. From a philosophical standpoint, Artificial Intelligence is the ability to help human’s live more good lives free of hard labor, as well as to assist in managing the complex web of interconnected individual people, businesses, states, and nations to function in a way that benefits all of civilization.
Currently, the objective of Artificial Intelligence is communicated by all the various techniques that we’ve created over the last long time – to optimize human effort and assist us in making important choices. Artificial intelligence has also been dubbed our “Final Discovery,” a conception that would create ground-breaking tools that would tremendously change how we live our lives, ideally eradicating conflict, unfairness, and human misery.

Where is Artificial Intelligence (AI) Used?
AI is used in a variety of domains to provide insights into user behavior and make suggestions based on data. Google’s forecasting search algorithm, for example, used previous user data to determine what a user would type next in the search field. Netflix uses past user data to suggest what movie a user should watch next, keeping the user on the forum and increasing watch time. Facebook uses past user data to instantly suggest tags for your friends predicated on their facial characteristics in their images. AI is being used by large organizations all over the world to make the lives of end users easier.
AI could be used to rapidly and conveniently complete tasks that humans find tedious, such as trying to sort through huge amounts of data and recognizing styles. It also enables machines to become cleverer in ways that would make them more convenient, easier to use, and competent of accomplishing more than ever before. This applies to our personal devices and larger industry technologies, as well as virtually everything between.

All these are more information are available in the book “Data Science and Machine Learning with Python” by Swapnil Saurav. Available on Amazon here

You can find more articles by the author on the topic here.