Blog - Swapnil Saurav

Learning

Jul 31 2022

QualityThought Learn ML – Code

JULY 31 2022: SQL Programming

SELect * from olym.olym_events

select * from olym.olym_base_events

select * from olym.olym_disciplines

select ID, SPORT from olym.olym_sports

select S.ID, S.SPORT, D.Discipline from olym.olym_sports S, olym.olym_disciplines D,   WHERE s.ID = d.sport_id

select * from olym.olym_medals_view where Edition=1996 and discipline='Tennis' and Gender='Men' and event = 'singles'
select * from olym.olym_medals_view where Edition>1950 and NOC='IND' and athlete like '%KU%' order by edition desc

Date: AUGUST 1 2022

import sqlite3
import pymysql
con = sqlite3.connect('library.db')  #1. Make connection to your Database
#con = pymysql()
dbobj = con.cursor()
command = '''
Create Table Books(
BOOKID INTEGER PRIMARY KEY,
TITLE TEXT,
PRICE REAL,
COPIES INTEGER
)
'''
command = '''
Insert Into BOOKS(
BookID, Title, Price,Copies
) values(3, 'Practice Machine Learning',410.25, 18)
'''
#dbobj.execute(command)
#con.commit()

command = '''
Delete from Books where Bookid=2
'''
#dbobj.execute(command)
#con.commit()

command = '''
Update Books set copies = 9 where Bookid=3
'''
#dbobj.execute(command)
command = '''
Select * from Books
'''
dbobj.execute(command)
records = dbobj.fetchall()
for r in records:
    current_count = r[3]
    print(r[3])  #Tuple

if current_count >0:
    command = '''
    Update Books set copies = '%d' where Bookid=3
    '''%(current_count-1)
    dbobj.execute(command)
    con.commit()
else:
    print("Sorry, we do not have any copies left")
print("After removing one value: ")
command = '''
Select * from Books where BookID=3
'''
dbobj.execute(command)
records = dbobj.fetchall()
for r in records:
    #current_count = r[3]
    print(r)
    #print(r[3])  #Tuple

while True:
    print("1. Add a New Book")
    print("2. Issue a Book")
    print("3. Display all books")
    print("4. Return a book")
    print("5. Exit")
    ch=int(input("Enter your choice: "))

AUGUST 2 2022

# File operations:
## Read - r  /  r+
## Write - w   /  w+
## Append - a
fileobj = open("files\\myfile.txt","a")
my_poem = '''HI
How are you today
You should be fine today
lets have a great day today
Enjoy your day today'''
print(fileobj.writable())
fileobj.write(my_poem)
fileobj.close()
fileobj = open("files\\myfile.txt","r")
output = fileobj.read(100)
fileobj.seek(0)
print(output)
output = fileobj.readline()
#output = fileobj.readlines()
print(output)
print("---------------")
fileobj.seek(49)
output = fileobj.read(10)
print(output)
fileobj.close()

## Read and remove vowels and save back

# File operations:
## Read - r  /  r+
## Write - w   /  w+
## Append - a
fileobj = open("files\\myfile.txt","w")
my_poem = '''HI
How are you today
You should be fine today
lets have a great day today
Enjoy your day today'''
fileobj.write(my_poem)
fileobj.close()
fileobj = open("files\\myfile.txt","r")
output = fileobj.read()
fileobj.close()
for i in "aeiouAEIOU":
    new_content = output.split(i)
    output = "".join(new_content)
fileobj = open("files\\myfile.txt","w")
fileobj.write(output)
fileobj.close()

#JSON

{
    "Name": "Sachin Tendulkar",
    "Teams": ['Mumbai','MI','India'],
    "Kids": {
        "Name": ['Arjun', 'Saara'],
        "Age": [23,25]
    }
}

#load /loads = read from json file
#dump /dumps = to write to json file
import json
txt = '{    "Name": "Sachin Tendulkar",     "Teams": ["Mumbai","MI","India"] , "Branch":["A","B","C"]   }'

jsonobj = json.loads(txt)

print(json.dumps(jsonobj, indent=5, sort_keys=True))
jsonfile = open("myjson.json","w")
json.dump(jsonobj, jsonfile,indent=5)

#Program to read a dictionary using loop and save the content as json in a file

txt = '{"name":"Rohit","teams":["MI","IND","M"]}'
f = open("files\\jsonfile.txt",'w')
json.dump(txt, f,indent=5)
f.close()
f = open("files\\jsonfile.txt",'r')
content = json.load(f)
print("After loading \n",content)

# File operations:
## Read - r  /  r+
## Write - w   /  w+
## Append - a
fileobj = open("files\\myfile.txt","w")
my_poem = '''HI
How are you today
You should be fine today
lets have a great day today
Enjoy your day today'''
fileobj.write(my_poem)
fileobj.close()
fileobj = open("files\\myfile.txt","r")
output = fileobj.read()
fileobj.close()
for i in "aeiouAEIOU":
    new_content = output.split(i)
    output = "".join(new_content)
fileobj = open("files\\myfile.txt","w")
fileobj.write(output)
fileobj.close()

AUGUST 4, 2022

try:
    num = int(input("Enter a number: "))
    a = 5
    b = 0
    val = a / b

except ValueError:
    print("Ending the execution of program because you have not entered a valid number")
except ZeroDivisionError:
    print("Zero division error, please retry")
except Exception:
    print("Not sure what but some error occurred")
finally:
    print("I am in finally")
print("Thank you")

# Errors:
#1. Syntax error
#2. Logical error
#3. Exception or runtime
#4. Exceptions: ZeroDivisionError, ValueError

while True:
    try:
        num1 = int(input("Enter first number: "))
        break
    except ValueError:
        print("Unknown error occurred, please try again")

while True:
    try:
        num2 = int(input("Enter second number: "))
        break
    except ValueError:
        print("Unknown error occurred, please try again")

sum = num1 + num2
print("Total: ",sum)


#WAP to input marks in 5 subjects and calculate total and average- use exception where necessary
class NegativeNumber(BaseException):
    pass
try:
    num_marks = int(input("Total number of subjects: "))
    if num_marks <0:
        raise NegativeNumber
    sum = 0
    for i in range(num_marks):
        while True:
            try:
                marks = int(input("Enter marks in subject " + str(i + 1) + ": "))
                break
            except ValueError:
                print("Invalid marks, try again!")
        sum += marks
    avg = sum / num_marks
    print("Total avg = ", avg)
except ValueError:
    print("Invalid input, exiting...")
except NegativeNumber:
    print("Sorry you are not allowed to enter Negative numbers, exiting...")

6 AUGUST 2022

#lambda function
#anonymous function
l1 = lambda x,y : x*y
print(l1(5,4))

#map:
ls1 = [2,4,8,16,32,64]
ls2=[]
for i in ls1:
    ls2.append(i**2)
print(ls2)

#map in list
result = map(lambda x: x**2, ls1)
print(list(result))

#filter
ls3 = [2,4,6,8,10,12,15,18,20,25,28,30,40]
#I want multiples of 5- that means
#filter out those which are not multiples of five
filtered_val = list(filter(lambda x: x%5==0,ls3))
print(filtered_val)
filtered_val = list(filter(lambda x: x>=18,ls3))
print(filtered_val)

#reduce
ls3 = [2,4,6,8,10,12,15,18,20,25,28,30,40]
#cumulative sum:
sum=0
for i in ls3:
    sum+=i
print("Sum: ",sum)

from functools import reduce
sol = reduce(lambda x,y: x+y, ls3)
print(sol)

# take a list of values (c) and using map convert them into F

#take a list of values and filter out values which are multiples of 3 and 7 only

# take a list of values (c) and using map convert them into F
ls1 = [2,3,54,6,7,87,65]
print(list(map(lambda x : (x*(9/5)+32),ls1)))

#take a list of values and filter out values which are multiples of 3 and 7 only
ls2 = [3,6,21,34,42,63,65,78,189]
print(list(filter(lambda x : x%3==0 and x%7==0,ls2)))

8 AUGUST 2022

Program to read content from wikipedia page:

import requests
link = "https://en.wikipedia.org/wiki/List_of_Indian_people_by_net_worth"
website_content = requests.get(url=link).text
#print(website_content)
from bs4 import BeautifulSoup
s = BeautifulSoup(website_content,'lxml')
#print(s.prettify())
print(s.title.string)
#tables = s.find_all('table')
my_table = s.find('table', class_ = "wikitable sortable")
table_links = my_table.find_all('a')
#print(table_links)
rich_indians =[]
for l in table_links:
    rich_indians.append(l.get('title'))
rich_indians.pop(0)
rich_indians.pop(0)
print(rich_indians)

9 AUGUST 2022

#NUMPY - matrix like datastructure
import numpy as np
x = range(9)
print(type(x))
x = np.reshape(x,(3,3))
print(x)
print(type(x))
print("Shape of the numpy: ",x.shape)
y=[[2,3,4],[5,6,2],[3,7,4]]
y = np.array(y)
print(y)
print(y[0])
print(y[0,2])
print(y[:,2])

#dummy values to the numpy
z = np.zeros((4,4))
print(z)
z = np.ones((4,4))
print(z)
z = np.full((4,4),2)
print(z)
idm1 = np.identity(3, dtype=int)
print(idm1)
print("Operation")
x=[[5,1,0],[1,1,2],[3,0,4]]
x = np.array(x)
y=[[2,3,4],[5,6,2],[3,7,4]]
y = np.array(y)
print(x)
print(y)
#print(x+y)
#print(x-y)
#print(x*y)
print(x/y)
#for above operations both matrices should have same shape
#MATRIX MULTIPLICATION
## condition a *b  matmul m * n => b should be equal to m
x=[[5,1,0],[1,1,2],[3,0,4]]
x = np.array(x)
y=[[2,3,4],[5,6,2],[3,7,4]]
y = np.array(y)
print(x)
print(y)
z = np.matmul(x,y)
print(z)

#determinant
a= np.array([[23,14],[37,28]])
det_a = np.linalg.det(a)
print(det_a)
inv_b = np.linalg.inv(a)
print(inv_b)
print(np.matmul(a,inv_b))

a= np.array([[23,28],[23,28]])
det_a = np.linalg.det(a)
print(det_a)
#Matrix with zero determinant, is singular matrix
inv_b = np.linalg.inv(a)
print(inv_b)
print(np.matmul(a,inv_b))

10 AUGUST 2022

# 3x +4y - 7z = 2
# -2x +y -z = -6
# x +y + z = 2
#form 3 matrices:
## Coefficient matrix
## Variable matrix
## Constant matrix
### Coefficient matrix X Variable Matrix = Constant Matrix
# 5X = 15  => X = 15/5
# => variable matrx = inverse of Coefficient matrix * Constant matrix
import numpy as np
coeff_matrix = np.array([[3,4,-7],[-2,1,-1],[1,1,1]])
cont_matrix = np.array([[2],[-6],[2]])
det_coeff = np.linalg.det(coeff_matrix)
if det_coeff==0:
    print("Solution is not possible")
else:
    variable_mat = np.matmul(np.linalg.inv(coeff_matrix) , cont_matrix)
    print(variable_mat)

AUGUST 11, 2022

# Permutation & Combination
# => selecting r things from n things
 ##  in Permutation Order Matters - 2 cases: with or without replacement
 ## in Combination Order Doesnt Matter - 2 cases: with or without replacement

 ### P = n! / (n-r)!
 ### C = n! / [(n-r)!  r!]

 # 10 students - 4students->

 #4 Coats, 3 hats, 2 umbrellas

from scipy.special import perm,comb
result = comb(10,4)
print(result)
# 6B , 4 G => 4 Students:
#1. 4B+0G, 3B +1G, 2B + 2G, 1B + 3G, 0B+4G
c1 = comb(6,4,repetition=True)
c2 = comb(6,3) + comb(4,1)
c3 = comb(6,2) + comb(4,2)
c4 = comb(6,1) + comb(4,3)
c5 = comb(4,4)
result = c1+c2+c3+c4+c5
print(result)

##4 Coats, 3 hats, 2 umbrellas
## 2
c1=perm(4,2)
c2 = perm(3,2)
c3 = perm(2,2)
result = c1 * c2 * c3
print(result)

###################
#Own a factory: 2 kinds of products: desktop & laptops
#each desktop gives you Rs 1000
# each laptop gives you Rs 2000
### How much is your profit?
# profit: 1000 * D  +  2000 * L  =========> OBJECTIVE
# manpower: 5000 min: D= 100 L= 41
##50 min     120 min <= Total of 5000 min
##1    2   <= 1000
# D = 1000 , L=500

##HDD: 1000
# 1 1:

# F -Full worker, P , R
#Obj:  200*F + 80 * P + 40*R
#Constraints:
## 200*F + 80 * P + 40*R <=4000

14 AUGUST 2022

## Scipy
import numpy as np
from scipy.optimize import minimize, LinearConstraint, linprog

x = 1;y = 1
profit_desktop, profit_notebook = 1000, 750
profit = profit_desktop*x + profit_notebook*y

obj_function = [-profit_notebook, -profit_desktop] #converting maximize to minimize
## constraints
lhs_contraint = [[1,1],[1,2],[4,3]]
rhs_constraint = [10000,15000,25000]
bounds =[(0,float("inf")),
         (0,10000)]
opt_sol = linprog(c=obj_function, A_ub=lhs_contraint, b_ub=rhs_constraint,bounds=bounds,
                  method="revised simplex")
if opt_sol.success:
    print("Solution is ",opt_sol)

# x + y +2000 =10000
# x+2y +0 =15000  #
# 4x + 3y + 0 <= 25000 #
# 1000 7000


### Pandas: library
### data type is called dataframe
data = [[1,"Rohit"],[2,"Pant"],[3,"Surya"],[4,"Dhawan"],[5,"Kohli"]]
import pandas as pd
data_df = pd.DataFrame(data, columns=["Position","Player"],index=["First","Second","Third","Forth","Fifth"])
print(data_df)

#fruit production
data = {
    "Apples": [100,200,150,250],
    "Oranges":[250,200,300,200],
    "Mangoes":[150,700,800,50]
}
data_df = pd.DataFrame(data,index=["Q1 2021","Q2 2021","Q3 2021","Q4 2021"])
print(data_df)

16 AUGUST 2022

# Monday to Friday - 10am to 12 noon
# online class

## Saturday - only offline class- practice
## Sunday - only - practice
##    #######################
#Pandas
import pandas as pd
link="https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/hotel_bookings.csv"
hotel_df = pd.read_csv(link)
#print(hotel_df)
df_shape = hotel_df.shape
print("Shape: ",df_shape)
print("Total rows = ",df_shape[0])
print("Data types: ", hotel_df.dtypes)
print(hotel_df['hotel'])
#filter numeric column
import numpy as np
numericval_df = hotel_df.select_dtypes(include=[np.number])
print(numericval_df)
numeric_cols =numericval_df.columns.values
print("Numeric columns in Hotel df is \n",numeric_cols)
#get non-numeric values
#exclude
nonnumericval_df = hotel_df.select_dtypes(exclude=[np.number])
print(nonnumericval_df)
nonnumeric_cols =nonnumericval_df.columns.values
print("Numeric non-columns in Hotel df is \n",nonnumeric_cols)

import matplotlib.pyplot as plt
#from matplotlib.pyplot import figure
#plt.figure((6,3))
import seaborn as sns
cols_25 = hotel_df.columns[:25]
colors = ['#FF5733','#3333FF']
sns.heatmap(hotel_df[cols_25].isnull(), cmap=sns.color_palette(colors))
plt.show()

for c in hotel_df.columns:
    pct_missing = (np.mean(hotel_df[c].isnull()))*100
    if pct_missing>85:
        print(f"{c} - {pct_missing}%")

17 AUGUST 2022


#Pandas
import pandas as pd
link="https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/hotel_bookings.csv"
hotel_df = pd.read_csv(link)
#print(hotel_df)
df_shape = hotel_df.shape
#print("Shape: ",df_shape)
#print("Total rows = ",df_shape[0])
#print("Data types: ", hotel_df.dtypes)
#print(hotel_df['hotel'])
#filter numeric column
import numpy as np
numericval_df = hotel_df.select_dtypes(include=[np.number])
print(numericval_df)
numeric_cols =numericval_df.columns.values
print("Numeric columns in Hotel df is \n",numeric_cols)
#get non-numeric values
#exclude
nonnumericval_df = hotel_df.select_dtypes(exclude=[np.number])
print(nonnumericval_df)
nonnumeric_cols =nonnumericval_df.columns.values
#print("Numeric non-columns in Hotel df is \n",nonnumeric_cols)

import matplotlib.pyplot as plt
#from matplotlib.pyplot import figure
#plt.figure((6,3))
import seaborn as sns
cols_25 = hotel_df.columns[:25]
colors = ['#FF5733','#3333FF']
sns.heatmap(hotel_df[cols_25].isnull(), cmap=sns.color_palette(colors))
plt.show()

for c in hotel_df.columns:
    missing = hotel_df[c].isnull()
    num_missing = np.sum(missing)

    pct_missing = (np.mean(hotel_df[c].isnull())) * 100
    if pct_missing > 85:
        print(f"{c} - {pct_missing}%")

for c in hotel_df.columns:
    missing = hotel_df[c].isnull()
    num_missing = np.sum(missing)
    if num_missing >0:
        hotel_df[f'{c}_missing'] = missing
#print(hotel_df.shape)
#create missing total column
missing_col_list = [c for c in hotel_df.columns if '_missing' in c]
print(missing_col_list)
hotel_df['_missing'] = hotel_df[missing_col_list].sum(axis=1)
#create bar graph
hotel_df['_missing'].value_counts().reset_index().plot.bar(x='index',y="_missing")
plt.show()
# delete the not required columns and rows
print("Before row dropping: ",hotel_df.shape)
row_missing = hotel_df[hotel_df['_missing'] > 10].index
print("========== ROW MISSING: \n",row_missing)
hotel_df = hotel_df.drop(row_missing, axis=0)  #axis = 0: look for each row
hotel_df = hotel_df.drop(['company'],axis=1)
print("After row & column dropping: ",hotel_df.shape)

for c in hotel_df.columns:
    missing = hotel_df[c].isnull()
    num_missing = np.sum(missing)

    pct_missing = (np.mean(hotel_df[c].isnull())) * 100
    if pct_missing > 0:
        print(f"{c} - {pct_missing}%")

med = hotel_df['babies'].median()
hotel_df['babies'] = hotel_df['babies'].fillna(med)
med = hotel_df['children'].median()
hotel_df['children'] = hotel_df['children'].fillna(med)

mode = hotel_df['meal'].describe()['top']
hotel_df['meal'] = hotel_df['meal'].fillna(mode)

mode = hotel_df['country'].describe()['top']
hotel_df['country'] = hotel_df['country'].fillna(mode)

med = hotel_df['agent'].median()
hotel_df['agent'] = hotel_df['agent'].fillna(med)

mode = hotel_df['deposit_type'].describe()['top']
hotel_df['deposit_type'] = hotel_df['deposit_type'].fillna(mode)

print("Missing values after all replacement:")
for c in hotel_df.columns:
    missing = hotel_df[c].isnull()
    num_missing = np.sum(missing)

    pct_missing = (np.mean(hotel_df[c].isnull())) * 100
    if pct_missing > 0:
        print(f"{c} - {pct_missing}%")

22 AUGUST 2022

import pandas as pd
datadf = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/Mall_Customers.csv”,index_col=0)
#print(datadf)
#slicing
#print(datadf[‘Gender’])
#print(datadf.iloc[:3,:])
#print(datadf.iloc[:3,-2:])
#print(datadf.loc[[2,4],[‘Age’,’Gender’]])

#Conditions
print(datadf[‘Age’].mean())
print(datadf.groupby(‘Gender’).mean())
print(datadf.groupby(‘Gender’)[‘Age’].mean())
print(datadf.groupby(‘Gender’)[‘Annual Income (k$)’].sum())
datadf = datadf.drop([‘Spending Score (1-100)’],axis=1) #dropping row 
print(datadf)

24 AUGUST 2022

import pandas as pd
datadf1 = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/user_usage.csv”,index_col=0)
import pandas as pd
datadf2 = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/user_device.csv”,index_col=0)

#Merge:
print(“Size of d1: “,datadf1.shape)
print(“Size of d2: “,datadf2.shape)
result = pd.merge(datadf1, datadf2,
                  on=‘use_id’,
                  how=“left”)
print(“result df size: “,result.shape)
result = pd.merge(datadf1, datadf2,
                  on=‘use_id’,
                  how=“right”)
print(“result df size: “,result.shape)

result = pd.merge(datadf1, datadf2,
                  on=‘use_id’,
                  how=“inner”)
print(“result df size: “,result.shape)
result = pd.merge(datadf1, datadf2,
                  on=‘use_id’,
                  how=“outer”)
print(“result df size: “,result.shape)  # 159 + 81 + 113 = 353
####  Machine Learning example
import pandas as pd
df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”)

#divide this into X (input variables) and y (Output variable)
X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values

#To perform Machine Learning: we need Python Library: Scikit-learn
# Step 1 of Preprocessing : Missing Value handling
# no missing values

#Step 2: Handling categorical values
from sklearn.preprocessing import LabelEncoder
#2.1: Encode
lb = LabelEncoder()
X[:,3] = lb.fit_transform(X[:,3])

#2.2: Column Transform: 1 to many (#of unique values)
#print(X)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
transform = ColumnTransformer([(‘one_hot_encoder’,OneHotEncoder(),[3])],remainder=‘passthrough’)
X = transform.fit_transform(X)
#2.3 drop anyone new column
X=X[:,1:]
print(X)

AUGUST 25 2022 (Machine Learning)

####  Machine Learning example
import pandas as pd
df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”)

#divide this into X (input variables) and y (Output variable)
X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values

#To perform Machine Learning: we need Python Library: Scikit-learn
# Step 1 of Preprocessing : Missing Value handling
# no missing values

#Step 2: Handling categorical values
from sklearn.preprocessing import LabelEncoder
#2.1: Encode
lb = LabelEncoder()
X[:,3] = lb.fit_transform(X[:,3])

#2.2: Column Transform: 1 to many (#of unique values)
#print(X)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
transform = ColumnTransformer([(‘one_hot_encoder’,OneHotEncoder(),[3])],remainder=‘passthrough’)
X = transform.fit_transform(X)
#2.3 drop anyone new column
X=X[:,1:]

from sklearn.model_selection import train_test_split
X_train, X_test,y_train,y_test =train_test_split(X,y, test_size=0.2)

#selection of algorithm
from sklearn.linear_model import LinearRegression
lm =LinearRegression()
lm.fit(X_train, y_train)  #training
y_pred = lm.predict(X_test)
result_df = pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)
# RMSE
#mse
from sklearn import metrics
mse = metrics.mean_squared_error(y_test,y_pred)
rmse = mse **0.5
print(“Root Mean Squared Error is: “,rmse)

#R2

#MAE
# Different phases on ML modeling
## 1. Preprocessing the dataset and making it ready for modeling
## 2. Choosing the right model – Regression / classification / clustering
## 2A. Breaking the dataset into Training and Test data
## 3. Run the model (choosing the algo and running): Training the algo 
## 4. Test your algorithm – parameter tuning 

29 AUGUST 2022

import pandas as pd
txt_df = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/OnlineRetail/master/order_reviews.csv")

#NLP: Natural Language Processing - NLP Analysis
#1. entire text to lowercase
#2. non-english, decomposition - convert non-english characters into English
#3. converting utf8
#4. Tokensize: converting sentence into words
#5. Removal Stop words - words which doesnt carry meaning
######################
import unicodedata
import nltk
#nltk.download('punkt')
#nltk.download('stopwords')

#function to normalize text
def normalize_text(word):
    return unicodedata.normalize('NFKD',word).encode('ascii', errors = 'ignore').decode('utf-8')
#get stop words database
STOP_WORDS = set(normalize_text(word) for word in nltk.corpus.stopwords.words('portuguese'))
#STOP_WORDS =

## function tp perform all the analysis
def convert_into_lowercase(comments):
    lower_case = comments.lower()
    unicode = unicodedata.normalize('NFKD',lower_case).encode('ascii', errors = 'ignore').decode('utf-8')
    words = nltk.tokenize.word_tokenize(unicode)
    words = tuple(word for word in words if word not in STOP_WORDS and word.isalpha())
    return words

analysis_txt = txt_df[txt_df['review_comment_message'].notnull()].copy()
#print(analysis_txt['review_comment_message'])
analysis_txt['review_txt'] =analysis_txt['review_comment_message'].apply(convert_into_lowercase)
#print(analysis_txt['review_txt'])


# Dont buy now
# unigram => Dont, buy, now
# bigram=> Dont buy, buy now
# trigram => Dont buy now

# create 2 datasets
rating_5 = analysis_txt[analysis_txt['review_score']==5]
rating_1 = analysis_txt[analysis_txt['review_score']==1]

def word_to_grams(words):
    unigrams,bigrams,trigrams = [],[],[]
    for w in words:
        unigrams.extend(w)
        bigrams.extend(" ".join(bigram) for bigram in nltk.bigrams(w))
        trigrams.extend(" ".join(trigram) for trigram in nltk.trigrams(w))
    return unigrams,bigrams,trigrams

unigram_5,bigram_5,trigram_5 = word_to_grams(rating_5['review_txt'])
unigram_1,bigram_1,trigram_1 = word_to_grams(rating_1['review_txt'])

#print(unigram_1)
#input()
#print(bigram_1)
#input()
print(trigram_1)
#input()

Web Scrapping Project to Practice on AUG 30, 2022

Z Score and Emphirical Rule (click here to access)

Refer PreProcessing Notes here

Refer Regression Notes here

SEPTEMBER 9, 2022 CLASS NOTES

import pandas as pd
import numpy as np

data_df = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv")
X = data_df.iloc[:,:-1].values
y = data_df.iloc[:,-1].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
le_obj = LabelEncoder()
X[:,3] = le_obj.fit_transform(X[:,3])
from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([('one_hot_encoder',OneHotEncoder(),[3])],remainder='passthrough')
X=np.array(transform.fit_transform(X), dtype=np.float)
################### ABOVE THIS COMMON FOR ALL
#drop one column
X = X[:,1:]
#print(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25)
############################## REGRESSION OR CLASSIFICATION
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

### POLYNOMIAL REGRESSION
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
parameter = [('polynomial',PolynomialFeatures(degree=2)),('modal',LinearRegression())]
Pipe = Pipeline(parameter)
Pipe.fit(X,y)
from sklearn import metrics
y_prep_poly = Pipe.predict(X_test)
mse = metrics.mean_squared_error(y_test,y_prep_poly)
rmse = np.sqrt(mse)
r2 = metrics.r2_score(y_test,y_prep_poly)
print("POLYNOMIAL: R2 and RMSE: ", r2,rmse)
########### BELOW IS COMMON FOR ALL REGRESSION
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

from sklearn.svm import SVR
svr_obj = SVR(kernel='linear')
svr_obj = SVR(kernel='poly',degree=3, C=100)

mse = metrics.mean_squared_error(y_test,y_pred)
rmse = np.sqrt(mse)
r2 = metrics.r2_score(y_test,y_pred)
print("R2 and RMSE: ", r2,rmse)

import statsmodels.api as sm
from statsmodels.api import OLS
X = sm.add_constant(X)
summary = OLS(y,X).fit().summary()
print(summary)

#First elimination
X_select = X[:,[0,3,5]]
X = sm.add_constant(X)
summary = OLS(y,X_select).fit().summary()
print(summary)

import pandas as pd
import numpy as np
from sklearn import metrics
data_df = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv")
X = data_df.iloc[:,:-1].values
y = data_df.iloc[:,-1].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
le_obj = LabelEncoder()
X[:,3] = le_obj.fit_transform(X[:,3])
from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([('one_hot_encoder',OneHotEncoder(),[3])],remainder='passthrough')
X=np.array(transform.fit_transform(X), dtype=np.float)
################### ABOVE THIS COMMON FOR ALL
#drop one column
X = X[:,1:]
#print(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25)
############################## REGRESSION OR CLASSIFICATION



from sklearn.svm import SVR
svr_obj = SVR(kernel='linear')
svr_obj = SVR(kernel='poly',degree=3, C=100)
i=0.03
while i<=0.06:
    i+=0.005
    for j in range(10,1000,200):
        svr_obj = SVR(kernel='rbf', C=j,gamma=i)
        y_pred = svr_obj.fit(X_train, y_train).predict(X_test)
        mse = metrics.mean_squared_error(y_test,y_pred)
        rmse = np.sqrt(mse)
        r2 = metrics.r2_score(y_test,y_pred)
        print(f"gamma = {i}, C = {j}, RMSE = {rmse} ")

from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor()
regressor.fit(X_train,y_train)
y_pred = regressor.predict(X_test)
se = metrics.mean_squared_error(y_test,y_pred)
rmse = np.sqrt(mse)
r2 = metrics.r2_score(y_test,y_pred)
print(f" RMSE = {rmse} and R2 = {r2} ")

from sklearn.ensemble import RandomForestRegressor
print("Performing Random Forest regressor")
for i in range(50,1000,75):
    regressor = RandomForestRegressor(n_estimators=i)
    regressor.fit(X_train,y_train)
    y_pred = regressor.predict(X_test)
    mse = metrics.mean_squared_error(y_test,y_pred)
    rmse = np.sqrt(mse)
    r2 = metrics.r2_score(y_test,y_pred)
    print(f" RMSE = {rmse} and R2 = {r2} ")

#Ridge LAsso as assignment

CLASSIFICATION NOTES

SEPTEMBER 11, 2022

import numpy as np
import pandas as pd
dataset = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv")
X =dataset.iloc[:,1:4].values
y =dataset.iloc[:,4].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
label = LabelEncoder()
X[:,0] = label.fit_transform(X[:,0])
transform = ColumnTransformer([('one_hot_encoder',OneHotEncoder(),[0])],remainder='passthrough')
X=np.array(transform.fit_transform(X), dtype=np.float)
X= X[:,1:]
print(X)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.25, random_state=1)
##############################
##classifier
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
####################################
#Model Evaluation: build confusion matrix
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
cm_test = confusion_matrix(y_test, y_pred)
y_train_pred = classifier.predict(X_train)
cm_train = confusion_matrix(y_train, y_train_pred)
accuracy_test = accuracy_score(y_test, y_pred)
accuracy_train = accuracy_score(y_train, y_train_pred)
print("CONFUSION MATRIX:\n-------------------")
print("TEST: \n",cm_test)
print("\nTRAINING: \n",cm_train)
print("\n ACCURACY SCORE OF TEST: ",accuracy_test)
print("\nACCURACY SCORE OF TRAINING: ",accuracy_train)
#############################

12 SEPTEMBER 2022: CLASSIFICATION – SVC< DECISION TREE

import numpy as np
import pandas as pd
dataset = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv")
X =dataset.iloc[:,1:4].values
y =dataset.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
label = LabelEncoder()
X[:,0] = label.fit_transform(X[:,0])
transform = ColumnTransformer([('one_hot_encoder',OneHotEncoder(),[0])],remainder='passthrough')
X=np.array(transform.fit_transform(X), dtype=np.float)
X= X[:,1:]
print(X)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.25, random_state=1)
##############################
##classifier
#from sklearn.linear_model import LogisticRegression
#classifier = LogisticRegression()
#from sklearn.svm import SVC
#classifier = SVC(kernel='rbf',gamma=0.1,C=100)
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion='entropy')
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

####################################
#Model Evaluation: build confusion matrix
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
cm_test = confusion_matrix(y_test, y_pred)
y_train_pred = classifier.predict(X_train)
cm_train = confusion_matrix(y_train, y_train_pred)
accuracy_test = accuracy_score(y_test, y_pred)
accuracy_train = accuracy_score(y_train, y_train_pred)

print("CONFUSION MATRIX:\n-------------------")
print("TEST: \n",cm_test)
print("\nTRAINING: \n",cm_train)
print("\n ACCURACY SCORE OF TEST: ",accuracy_test)
print("\nACCURACY SCORE OF TRAINING: ",accuracy_train)

#############################
# Complete the visualization step

13 SEPTEMBER 2022

import numpy as np
import pandas as pd
dataset = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv")
X =dataset.iloc[:,1:4].values
y =dataset.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
label = LabelEncoder()
X[:,0] = label.fit_transform(X[:,0])
transform = ColumnTransformer([('one_hot_encoder',OneHotEncoder(),[0])],remainder='passthrough')
X=np.array(transform.fit_transform(X), dtype=np.float)
X= X[:,1:]
print(X)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.25, random_state=1)
##############################
##classifier
from sklearn.ensemble import RandomForestClassifier
#classifier = RandomForestClassifier(n_estimators=100,criterion='entropy')
from sklearn.linear_model import SGDClassifier
classifier = SGDClassifier(max_iter=5000, tol=0.01,penalty="elasticnet")
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

####################################
#Model Evaluation: build confusion matrix
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
cm_test = confusion_matrix(y_test, y_pred)
y_train_pred = classifier.predict(X_train)
cm_train = confusion_matrix(y_train, y_train_pred)
accuracy_test = accuracy_score(y_test, y_pred)
accuracy_train = accuracy_score(y_train, y_train_pred)

print("CONFUSION MATRIX:\n-------------------")
print("TEST: \n",cm_test)
print("\nTRAINING: \n",cm_train)
print("\n ACCURACY SCORE OF TEST: ",accuracy_test)
print("\nACCURACY SCORE OF TRAINING: ",accuracy_train)

#############################
# Complete the visualization step

SEPTEMBER 15 2022

Practice project from below link:

1. Predict future sales: https://thecleverprogrammer.com/2022/03/01/future-sales-prediction-with-machine-learning/

2. Predict Tip for the waiter: https://thecleverprogrammer.com/2022/02/01/waiter-tips-prediction-with-machine-learning/

SEPTEMBER 16 2022

1. NLP – Flipkart Review analysis: https://thecleverprogrammer.com/2022/02/15/flipkart-reviews-sentiment-analysis-using-python/

2. Cryptocurrency Price Prediction: https://thecleverprogrammer.com/2021/12/27/cryptocurrency-price-prediction-with-machine-learning/

SEPTEMBER 17 2022

1. Demand Prediction: https://thecleverprogrammer.com/2021/11/22/product-demand-prediction-with-machine-learning/

SEPTEMBER 19 2022

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
x,y = make_blobs(n_samples= 300, n_features=2,centers=3, random_state=88)
plt.scatter(x[:,0],x[:,1])
plt.show()
from sklearn.cluster import KMeans
cluster_obj = KMeans(n_clusters=2,init='random',max_iter=500)
Y_val = cluster_obj.fit_predict(x)
print(Y_val)
#plotting the centers
plt.scatter(x[Y_val==0,0],x[Y_val==0,1],c="blue",label="Cluster 0")
plt.scatter(x[Y_val==1,0],x[Y_val==1,1],c="red",label="Cluster 1")
#plt.scatter(x[Y_val==2,0],x[Y_val==2,1],c="black",label="Cluster 2")
#plt.scatter(x[Y_val==3,0],x[Y_val==3,1],c="green",label="Cluster 3")
#plt.scatter(x[Y_val==4,0],x[Y_val==4,1],c="Yellow",label="Cluster 4")
plt.show()
#Measure Distortion for elbow graph
distortion = [] #save distortion from each k value
for i in range(1,50):
    cluster_obj = KMeans(n_clusters=i, init='random', max_iter=500)
    cluster_obj.fit(x)
    distortion.append(cluster_obj.inertia_)
print(distortion)
plt.plot(range(1,50),distortion)
plt.show()


from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
x,y = make_blobs(n_samples= 20, n_features=2,centers=3, random_state=88)
plt.scatter(x[:,0],x[:,1])
plt.show()
from sklearn.cluster import KMeans
cluster_obj = KMeans(n_clusters=2,init='random',max_iter=500)
Y_val = cluster_obj.fit_predict(x)
print(Y_val)
#plotting the centers
plt.scatter(x[Y_val==0,0],x[Y_val==0,1],c="blue",label="Cluster 0")
plt.scatter(x[Y_val==1,0],x[Y_val==1,1],c="red",label="Cluster 1")
#plt.scatter(x[Y_val==2,0],x[Y_val==2,1],c="black",label="Cluster 2")
#plt.scatter(x[Y_val==3,0],x[Y_val==3,1],c="green",label="Cluster 3")
#plt.scatter(x[Y_val==4,0],x[Y_val==4,1],c="Yellow",label="Cluster 4")
plt.show()
#Measure Distortion for elbow graph
distortion = [] #save distortion from each k value
for i in range(1,50):
    cluster_obj = KMeans(n_clusters=i, init='random', max_iter=500)
    cluster_obj.fit(x)
    distortion.append(cluster_obj.inertia_)
print(distortion)
plt.plot(range(1,50),distortion)
plt.show()

SEPTEMBER 20, 2022

import pandas as pd
dataset = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/USArrests.csv")

data_df = dataset.iloc[:,1:]
print(data_df)

import scipy.cluster.hierarchy as sch
import matplotlib.pyplot as plt
plt.figure(figsize=(9,6))
dendo_obj = sch.dendrogram(sch.linkage(data_df))
plt.axhline(y=26)
plt.show()

from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=3)
Y_pred = cluster.fit_predict(data_df)
print(Y_pred)
plt.figure(figsize=(9,6))
plt.scatter(data_df.iloc[:,0],data_df.iloc[:,1], c=cluster.labels_)
plt.show()

Next class on Sunday 25th

Practice below 8 projects during that time.

Download the project pDF for Code

Download the dataset from here

SEPTEMBER 28, 2022

import pandas as pd
from apyori import apriori
data = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/Market_Basket_Optimisation.csv")
print(data.shape)
products = []
cols = 20
for i in range(len(data)):
    #for j in range(20):
        products.append(str(data.values[i,j]) for j in range(20) )

#print(products)
association = apriori(products,min_support=0.001,min_confidence=0.1,min_lift=2)
print("Associated Products are: \n",list(association))

############################

SEPTEMBER 29, 2022

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
import numpy as np
#from statsmodels.tsa.arima_model import ARIMA - removed
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose

data_df = pd.read_csv("https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/AirPassengers.csv",
                      index_col=['Month'],parse_dates=['Month'])
rolling_mean = data_df.rolling(window=12).mean()
rolling_std = data_df.rolling(window=12).std()

plt.plot(data_df, label="Original Data")
plt.plot(rolling_mean, color="red", label="Rolling Mean")
plt.plot(rolling_std, color="green", label="Rolling StdDev")
plt.show()

afduller_result = adfuller(data_df['#Passengers'])
print("ADF Stats = ",afduller_result[0])
print("P-Value = ",afduller_result[1]) #<0.05 then its stationary
for k,v in afduller_result[4].items():
    print(k," : ",v)

#to make it stationary - we need to find log value

mean_log = data_df.rolling(window=12).mean()
std_log = data_df.rolling(window=12).std()
plt.plot(data_df, color="blue",label="Log of Original Data")
#plt.plot(data_df, color="black",label="Original Data")
plt.plot(mean_log, color="red", label="Rolling Mean")
plt.plot(std_log, color="green", label="Rolling StdDev")
plt.title("All information")
plt.show()

#Now we will perform TSA using ARIMA model
#Prediction
order_val = (2,1,2)
#tsa_model = ARIMA(data_log, order = order_val)  #old
tsa_model = ARIMA(data_df['#Passengers'].values, order=(2, 1, 2))
tsa_result = tsa_model.fit()
print("Summary: \n",tsa_result.summary())

# we have 12 * 12 + 12 * 10 months
data = tsa_result.predict(264)  #predict for next 10 yrs
plt.plot(data_df,color="blue",label="Log Data")
plt.plot(data,color="red",label="Fitted Value")
plt.title("Log Data and Predicted Values")
plt.show()



#new
# make predictions
predictions = tsa_result.forecast(120)
plt.plot(predictions,color="red")
plt.title("Using Forecast Method")
plt.show()

NOTE: from statsmodels.tsa.arima_model import ARIMA is no longer used, instead use:

statsmodels.tsa.arima.model.ARIMA

Predict() is no longer used, instead use forecast()
Predict was from the initial period but forecast takes the period in future

OCTOBER 8 2022

Click here for entire R content

Machine Learning with R

Click here to access Regression using R code

Click here to access Classification using R code

Click here to access Clustering using R code

Click here to access Summary of ML using R code

admin 0 Comments

Learning

Jul 31 2022

Learning ML – July 2022 by Digital

DAY 1: 31 JULY 2022

print(5+6+3)
print('5+6+3')
print('DDDDDDDDDDDDD')
print("DDDDDDD\tDDDDDDD",4+3+2,"This is it")
print("\\\\n is for newline")
# \ escape sequence
print("Good Morning",end="! ")
print("How are you", end="? ")
print("I am fine")
print(5+3)
a=5  # sPDSJIJFGDFSJGKDFGM  print()
b=3
print(a+b)
a=50
#comment
'''
This is a sample comment
Please ignore
another line of
comment
'''

# Data Types - basic: int, float, str, bool, complex
# int - integer: non decimal values: 8 -5 -4 99 8989999
# float - +/- values with decimal point:  5.0 -5.0 8.989
# str - string  "hello" 'morning'   '''How'''  """Thanks"""
#bool - boolean:  True  / False
#complex - square root of -1 (i)   j in python:  5j:   sqrt(-100)  100*-1 = 10j

val1 = 5
print(val1, " : ",type(val1))  #type
val1 = -3.0
print(val1, " : ",type(val1))
val1 = "Hello"
print(val1, " : ",type(val1))
val1 = True
print(val1, " : ",type(val1))
val1 = 4+2j
print(val1, " : ",type(val1))
val2 = 4-2j
print(val1 * val2) #(a+b)(a-b) = a sq - b sq:  16 + 4 = 20 +0j)
a,b,c = 3,8.6,True
print(a,type(b),c)
#  5.6 + 4.4 = 10.0

Click here to Watch the Video Day 1

DAY 2: 1 AUGUST 2022

Download the Python Installer from here (we will install Python 3.9.9):

https://www.python.org/downloads/release/python-399/

You can follow the instructions to install from here:

http://learn.swapnil.pw/python/pythoninstallation/

Watch installation video from here: https://www.youtube.com/watch?v=mhpu9AsZNiQ

Download & Install IDE for Python Programming

Follow the steps from here: https://learn.swapnil.pw/python/pythonides/

DAY 3: 2 AUGUST 2022

#Basic data types

#int
print("Working with Integer: ARITHEMATIC OPERATIONS")
var1 = 7
var2 = 3
print(var1 + var2)  #add
print(var1 - var2)  #difference
print(var1 * var2) #multiply
print(var1 / var2)  #divide (float value)
print(var1 // var2)  #integer divide (only integer part)
print(var1 ** var2)  #var1 to the power of var2
print(var1 % var2)   #modulo - this gives us remainder
print( 4+5/2+4-5*3+2*5)
print( 5.5)
#float
print("Working with Float now")
var1 = 7.0
var2 = 3.0
print(var1 + var2)  #add
print(var1 - var2)  #difference
print(var1 * var2) #multiply
print(var1 / var2)  #divide (float value)
print(var1 // var2)  #integer divide (only integer part)
print(var1 ** var2)  #var1 to the power of var2
print(var1 % var2)   #modulo - this gives us remainder
#str
print("Working with Strings now")
var1 = 'Good Morning'
var2 = "How are you?"
var3 = '''How are you today?
Whats your plan?
Are you doing well?'''
var4 = """I am fine"""
print(var1)
print(var3)
print(var1 + " "+var2)
print((var1 + " ")* 3)
#bool
var1 = True
var2 = False
#AND OR NOT XOR - operations for boolean values
# AND : If one of the value is False then result will be false 
#- otherwise it will True

print("AND OPERATION")
print(True and True)
print(True and False)
print(False and True)
print(False and False)
#Prediction:  Rohit and Surya will open the batting
#Actual:  Rohit and Pant opened the batting
print("OR OPERATION")
print(True or True)
print(True or False)
print(False or True)
print(False or False)
#Prediction:  Rohit or Surya will open the batting
#Actual:  Rohit and Pant opened the batting

#complex (imaginary numbers)
print("Working with Complex now")
var1 = 6j
print(var1 **2)  #6j*6j = 36* -1= -36 + 0j
print(var1 * var1)

#Comparison Operator: Output is always boolean
print(5 > 6)  #False
print(6 < 6)  #False
print(5 <= 6)  #True
print(6 >= 6) #True
print(6==6)  #True
print(5!=5) #False

DAY 3 Video Recording

DAY 4 : AUGUST 3, 2022

#WAP to find the total cost when quantity and price of each item is given
quant = 19
price_each_item = 46
total_cost = price_each_item * quant
print("Total cost for",quant,"quantities with each costing Rs",price_each_item,"is Rs",total_cost)
print(f"Total cost for {quant} quantities with each costing Rs {price_each_item} is Rs {total_cost}")
print(f"Total cost for {quant} quantities with each costing Rs {price_each_item} is Rs {total_cost}")

quant = 3
total_cost = 100
price_each_item = total_cost / quant
print(f"Total cost for {quant} quantities with each costing Rs {price_each_item:0.2f} is Rs {total_cost}")

#Format string values
name = "Kohli"
country = "India"
position = "One Down"
print(f"Player {name:.<15} represents {country:^10} and plays at {position:>10} for international matches")

name = "Ombantabawa"; country = "Zimbabwe"; position = "opener"
print(f"Player {name:<15} represents {country:_^10} and plays at {position:X>10} for international matches")
# Add padding - we will fix the number of spaces for each variable

#logical line v physical line
#; to indicate end of line but its not mandatory

a,b,c,d = 10,20,30,40  #assign multiple values
print(c) #30
print("Hello\nThere")
print("Good Morning",end=". ")
print("How are you?")

# wap to take side of a square as input from the user and perform area and perimeter
#area = sides ** 2
#perimeter = 4 * s
side = input("Enter the side value: ") # is used to take input (dynamic value)
side = int(side) # to convert into integer, we use int(). flot -> float(), string -> str()
print(side, ":",type(side))

perimeter = 4 * side
area = side ** 2
print(f"Square of {side} has a perimeter of {perimeter} and the area is {area}")

## Use input() and formatting for below assignments
#1. Input the length and breadth of a rectangle and calculate area and perimeter
#2. Input radius of a circle and calculate area and circumference

Click here to View Day 4 Video

DAY 5: AUGUST 4, 2022

#Conditions -  IF, IF-ELSE,  IF - ELIF - ELSE, IF - ELIF............ ELSE
avg = 40
#if avg > 50: pass
if avg >50:
    print("Congratulations")
    print("You have passed")
    print("Great job")
else:
    print("Sorry, you have failed")

# avg > 70: Grade A,  avg > 50 B ;
avg = 60
if avg >=70:
    print("Grade A")
elif avg >=50:
    print("Grade B")
else:
    print("Grade C")

#Avg > 80: A, 70: B, >60: C, >50: D, 40: E, <40: F
avg = 90
#Nested IF- if inside another if
if avg>=80:
    if avg >=90:
        print("AWESOME PERFORMANCE")
        if avg >=95:
            print("You win Presidents Medal")

    print('grade A')
elif avg>=70:
    print('grade B')
elif avg>=60:
    print('grade C')
elif avg>=50:
    print('grade D')
##########3
avg=90
if avg>=80:
    print ('grade A')
elif avg>=70:
    print('grade B')
elif avg>=60:
    print('grade C')
elif avg>=50:
    print('grade D')
elif avg>=40:
    print('grade E')
else:
    print('grade F')

#WAP to check if a person is eligible to vote in India or not
#Age >=18 then you will check nationality - yes

age = 18
nationality = "indian"
if age >=18:
    if nationality =="Indian":
        print("You are eligible to vote in India")
    else:
        print("Sorry, only Indians are eligible to vote")
else:
    print("Sorry you do not meet the required criteria")

age =20
if age>=18:
    print(" awesome you are eiligble to vote ")


#Assignment: Take 3 numbers and print the highest, second highest and the lowest value

Click here for Day 5 Video

DAY 6: 11 AUGUST 2022

#Strings
val1 = "Hello"
val2 = 'Good Morning'
val3 = '''Hello 
how are 
you'''
val4 = """I am fine thank you"""

# what's your name?
print("what's your name?")
print('what\'s your name?')  #escape character - \ it works only for 1 character after
print("\\n will give new line")
print("\\\\n will give new line")
val1 = "Hellodsogjidaoioadpif orgpoaitpoaigtpoafdifgpo poergpadigpifgpi igopigof oprgiodfigdofig"

#indexing /slicing - []
print(val1[0])  #first character
print(val1[4])
print(len(val1))
tot_char = len(val1)
print(val1[tot_char - 1])   #last character
print(val1[tot_char - 3])   #3rd last character
print(val1[- 1])   #last character
#series of characters in continuation
val1 = "GOOD DAY"
print(val1[5:8])
print(val1[0:4])
print(val1[:4])
#negative indexes
print(val1[-5:-2])
print(val1[-7:-5])
print(val1[-3:])  #DAY
print(val1[:])  #DAY

Click here to Watch Day 6 Video

Day 7 – AUGUST 16 , 2022 STRING -2 and Basic Intro to IF Condition

#Methods in String
txt1 = "Good Morning"
print(txt1[-5:-1])
#len(txt1)
fname="Sachin"
print(fname.isalpha())
print(fname.isupper())  #SACHIN
print(fname.islower())  #sachin tendulkar
print(fname.istitle())  #Sachin Tendulkar
print(fname.isdigit())
print(fname.isalnum())
print(fname.upper())
print(fname.lower())
print(fname.title())
fname = "Sachin Tendulkar"  #input("Enter your name: ")
print(fname.upper().count("S"))
print(fname.upper().count("S",1,7))
txt1 = "First Second Thirty first thirty fifth sixty first sixty ninth"
print("Total first are: ",txt1.upper().count("ST "))
print(fname.lower().index("ten"))
print(fname.lower().replace("ten","eleven"))

#############  IF Condition ###########
fname = "Sachin Tendulkar"
ten_count = fname.lower().count("ten")
print(ten_count)
if ten_count >0:
                print("Counting the number of ten(s)")
                print(fname.lower().index("ten"))
print("Thank You")

Click here for Day 7 Video

Access free Python, Machine Learning and various other learning material from STORY MANTRA app on Android Available on Android only:
https://play.google.com/store/apps/details?id=com.hachiweb.story_mantra&hl=en_IN&gl=US Download, install and login You can search for subjects, for example Python material is under TECHNICAL -> PYTHON

DAY 8 : AUGUST 18 , 2022

# Conditions
avg = 30
if avg >=40:
    print("You have passed")
    print("Congratulations")
else:
    print("I am in else")
    print("Sorry, you havent passed")
print("Thank you")

num = 0
if num >0:
    print(f"{num} is positive")
elif num==0:
    print("Zero is neither positive or negative")
else:
    print(f"{num} is negative")

### Take marks in 5 subjects, calculate total and avg, based on avg assign grades
marks1 = int(input("Enter marks in subject 1: "))
marks2 = int(input("Enter marks in subject 2: "))
marks3 = int(input("Enter marks in subject 3: "))
marks4 = int(input("Enter marks in subject 4: "))
marks5 = int(input("Enter marks in subject 5: "))
total = marks1 + marks2 + marks3 + marks4 + marks5
avg = total / 5
print(f"Student has scored total marks of {total} and average of {avg}")
#avg>=80: A, avg>=70: B, avg>=60: C, avg>=50: D, avg>=40: E, avg<40: Failed
#avg >=90: win school medal  / avg>95: President Medal
if avg>=80:
    print("You have scored Grade A")
    if avg>=90:
        if avg>=95:
            print("You win President Medal")
        else:
            print("You win School Medal")
elif avg>=70:
    print("You have scored Grade B")
elif avg>=60:
    print("You have scored Grade C")
elif avg>=50:
    print("You have scored Grade D")
elif avg>=40:
    print("You have scored Grade E")
else:
    print("You have scored Failed Grade")
    if avg>=35:
        print("You just missed, try harder next time")
    elif avg>=20:
        print("Please study hard")
    else:
        print("You are too far behind")
##########
#WAP to find the bigger of the 2 numbers
num1,num2 = 30,50
if num1>=num2:
    print(f"{num1} is greater than or equal to {num2}")
else:
    print(f"{num2} is greater than {num1}")

#WAP to find the bigger of the 3 numbers
num1,num2,num3 = 90,50,140
b1 = num1
if num1>=num2: #between num1 and num2 we know num1 is greater
    if num1 >= num3:
        b1=num1
    else: #num1 >num2 and num3 > num1
        b1=num3

else: #num2 is greater than num1
    if num2 > num3:
        b1=num2
    else: #num1 >num2 and num3 > num1
        b1=num3
print(f"{b1} is greatest")

## get the order of 3 numbers (decreasing order)
num1,num2,num3 = 9,50,40
b1,b2,b3 = num1,num1,num1
if num1>=num2: #between num1 and num2 we know num1 is greater
    if num1 >= num3:
        b1=num1
        if num2 >=num3:
            b2,b3=num2, num3
        else:
            b2, b3 = num3, num2
    else: #num1 >num2 and num3 > num1
        b1,b2,b3=num3,num1,num2

else: #num2 is greater than num1
    if num2 > num3:
        b1=num2
        if num1>=num3:
            b2,b3=num1,num3
        else:
            b2, b3 = num3, num1
    else: #num1 >num2 and num3 > num1
        b1,b2,b3=num3,num2,num1

print(f"{b1} >= {b2} >= {b3}")

Click here to Watch Day 8 Video

DAY 9: AUGUST 20, 2022

#Loops : repeat set of lines of code multiple times
# for - for loop when we know how many times
# while - used for repeatition based on conditions
for i in range(0,5,1): #generate values starting from zero upto 5(excluded), increment is 1
      print(i+1)

for i in range(2,15,4): #generate values starting from 2 upto 15(excluded), increment is 3
      #print(i+100)  #3,7,11, 15
      print("Hello")
for j in range(3,7):  #start & end - default is increment = 1
      print(j)
for i in range(4): #its ending value, default is start(=0) & increment (=1)
      print(i+1)

n = 5
for i in range(n):
      print("*",end=" ")
'''
* * * * * 
* * * * * 
* * * * * 
* * * * * 
* * * * * 
'''
print("\n2...........")
for j in range(n):
      print()
      for i in range(n):
            print("*",end=" ")
print()

'''
*  
* *  
* * *  
* * * *  
* * * * * 
'''
print("\n3...........")
for j in range(n):
      print()
      for i in range(j+1):
            print("*",end=" ")
print()

'''  
* * * * * 
* * * *
* * *
* *
* 
'''
print("\n4...........")
for j in range(n):
      print()
      for i in range(n-j):
            print("*",end=" ")
print()

#Assignment
'''  
    * 
   * * 
  * * * 
 * * * * 
* * * * *  
'''

Click Here to Watch Day 9 Video

DAY 10: AUGUST 21, 2022

#While
##repeat block of code
 
n=1
while n<=5:
    print(n)
    n+=1
 
#wap to read marks in 3 subjects and calculate sum and average till user want
choice = “y”
 
while choice==“n”: #entry check is not important
    sum=0
    for i in range(3):
        marks = int(input(“Enter the marks in subject “+str(i+1)+“: “))
        sum+=marks
    avg = sum/3
    print(f“Sum is {sum} and average is {avg}“)
    choice = input(“Type y to continue, anyother key to stop: “)
 
# instances where entry check is not important, you can create infite loop
while True: #entry check is not important
    sum=0
    for i in range(3):
        marks = int(input(“Enter the marks in subject “+str(i+1)+“: “))
        sum+=marks
    avg = sum/3
    print(f“Sum is {sum} and average is {avg}“)
    choice = input(“Type y to continue, anyother key to stop: “)
    if choice!=‘y’:
        break
 

#wap to generate numbers between given input values
sn = int(input(“Enter the start number: “))
en = int(input(“Enter the end number: “))
for i in range(sn,en+1):
    print(i, end=”   “)
print()

while sn <= en:  #entry check is important
    print(sn, end=”   “)
    sn+=1
print()

#Assignment 1: WAP to check if a number is prime or not
#Assignment 2: WAP to generate first 10 multiples of given value

Day 10 Video: Watch Here

DAY 11: AUGUST 22, 2022

#wap to read menu options
import getpass
dict_username = {}
while True:
    print(“Select your options: “)
    print(“1. Register \n2. Add Member\n3. Add Books\n4. Issue Books \n5. Return Books”)
    print(“6. Display Username”)
    print(“\n11. Quit”)

    ch=int(input(“Your Option: “))
    if ch==1:
        uname = input(“Enter username: “)
        passwd = getpass.getpass(“Enter Password: “)  #input(“Enter password: “)
        t_dict = {uname:passwd}
        dict_username.update(t_dict)
    elif ch==2:
        pass
    elif ch==3:
        pass
    elif ch==4:
        pass
    elif ch==5:
        pass
    elif ch==11:
        break
    elif ch==6:
        #Displaying username
        print(“Usernames are:”)
        for i in dict_username.values():
            print(i)
    else:
        print(“Invalid option! Please try again…  “)
        continue

    print(“Your Option has been successfully completed!”)

#wap to guess the number thought by the computer
import random
comp_num = random.randint(1,100) #int(input(“Enter a number: “))
counter = 0
while True:
    guess_num = int(input(“Guess the number: “))
    counter+=1
    if guess_num == comp_num:
        print(f“You have guessed the number correctly in {counter} attempts!”)
        break
    else:
        print(“You have not correctly guessed the number!”)
        if guess_num > comp_num:
            print(“HINT: You have guessed a higher number!!!”)
        else:
            print(“HINT: You have guessed a lower number!!!”)
    
#wap to guess the number thought by the computer
import random
comp_num = random.randint(1,100) #int(input(“Enter a number: “))
counter = 0
low,high=1,100
while True:
    guess_num = random.randint(low,high) #int(input(“Guess the number: “))
    counter+=1
    if guess_num == comp_num:
        print(f“You have guessed the number {comp_num} correctly in {counter} attempts!”)
        break
    else:
        print(f“{guess_num} is not correct”)
        if guess_num > comp_num:
            #print(“HINT: You have guessed a higher number!!!”)
            high = guess_num-1
        else:
            #print(“HINT: You have guessed a lower number!!!”)
            low = guess_num+1
    

#wap to guess the number thought by the computer
import random
comp_num = random.randint(1,100) #int(input(“Enter a number: “))
counter = 0
low,high=1,100
while True:
    guess_num = (low+high)//2  #random.randint(low,high) #int(input(“Guess the number: “))
    counter+=1
    if guess_num == comp_num:
        print(f“You have guessed the number {comp_num} correctly in {counter} attempts!”)
        break
    else:
        print(f“{guess_num} is not correct”)
        if guess_num > comp_num:
            #print(“HINT: You have guessed a higher number!!!”)
            high = guess_num-1
        else:
            #print(“HINT: You have guessed a lower number!!!”)
            low = guess_num+1

Day 11 Video: Watch Here

Day 12: AUGUST 25 2022 – STRING – 2 and LIST – 1

#Strings

txt1 = “Hello”
txt2 = ‘Good Morning Good Day’
txt3 = ”’How are you?
where are you going
when will you be back”’
txt4 = “””I am fine
I am going to school
I will be back in the evening”””
print(type(txt1), type(txt2),type(txt3),type(txt4))
print(txt3)
print(txt2)

print(txt1 + txt2)
print(“7” + “8”)
print(txt1 * 4)
num = 78
num = int(str(78)*4)
print(num)
print(“Fine” in txt4) #membership test
for i in txt1:
    print(i, end=” “)
print()

txt1 = “Good Morning”
#Strings are immutable
#txt1[0]=”H” – you cant edit/ overwrite
txt1 = “H” + txt1[1:]
print(txt1)
#reverse the text
reverse_str = “”
for i in txt1:
    reverse_str = i+reverse_str
print(“Reversed String: “,reverse_str)

given_txt = “Sachin;Kohli;Rohit;Kapil;Dhoni;”
if “;” in given_txt:
    given_txt = given_txt.replace(“;”,” “)
    print(given_txt)
else:
    print(“Sorry, text doesnt have : as separator”)

# WAP to read a string and find sum of all the numbers only 
#and keep doing till you find single number
txt1 = “sifdsdi43250934ur934ur09csdi43250934ur09c”

while len(txt1)!=1:
    sum=0
    for i in txt1:
        if i.isdigit():
            sum+=int(i)
    txt1 = str(sum)

print(“Final Sum is “,sum)

#List”
list1 = [2,4,5.5,“Hello”, True,[3,6,9]]
print(type(list1))
print(len(list1))
print(list1[0])
print(list1[-1])
print(list1[-3:-1])
print(list1[-2:])
print(type(list1[-2]))
print(list1[-3][-3:])
print(list1[1]+list1[0])
list2 = [4,8,12]
print(“list addition: “,list1 +list2)
for i in list2:
    print(i)

##############
list1 = []
#adding members using append
list1.append(5) #added at the back
list1.append(15)
list1.append(25)
list1.append(35)
print(list1)
#adding using insert(position,value)
list1.insert(1,10)
list1.insert(3,20)
print(list1)

#WAP a program to read marks of 5 students and calculate
sum=0
marks = []
for i in range(5):
    m = int(input(“Enter marks for subject “+str(i+1)+“: “))
    sum+=m
    marks.append(m)
print(f“Marks obtained are {marks} and the total is {sum}“)

# modify the above program to read marks of 5 students
#all_marks = [[],[],[],[]]

Watch Day 12 Video Here

DAY 13 : AUGUST 27, 2022

#27 AUGUST 2022

list1 = []
print(len(list1))
#append() – adds at the last
#insert() – inserts at given position
#pop() – removes from given position
#remove() – removes given value

#Queues: First In First Out (FIFO)
my_queue = []
while True:
    print(“Select following options:”)
    print(“1. Display the content of the Queue\n2. Add a new member”)
    print(“3. Remove the member\n4. Exit”)
    ch=input(“Ënter your choice:”)
    if ch==“2”:
        inp = input(“Enter the member to be added: “)
        my_queue.append(inp)
    elif ch==“3”:
        if len(my_queue)<=0:
            print(“Sorry, there is no one in the queue!”)
            continue
        my_queue.pop(0)
    elif ch==“1”:
        print(“Current members in the queue: \n”,my_queue)
    elif ch==“4”:
        break
    else:
        print(“Invalid Option, try again!”)
#Stack: Last In First Out (LIFO)

#Implement Stack as assignment

#Strings are immutable
str1 = “Hello”
#str1[1]= “E”
list1 = [“H”,“E”,“L”,“L”,“O”]
list1[1] = “K”
print(list1)
#Lists are MUTABLE
list1 = [“H”,“E”,“L”,“L”,“O”]
list2 = list1   #shallow copy
list3 = list1.copy()   #deep copy
print(“1. List 1”, list1)
print(“1. List 2”, list2)
print(“1. List 3”, list3)
list1.append(“K”)
list2.append(“L”)
list3.append(“M”)
print(“2. List 1”, list1)
print(“2. List 2”, list2)
print(“2. List 3”, list3)

list3.clear()
print(list3)
del list3  #delete the variable
#print(list3)
count = list1.count(“L”)
print(count)
list3 = list1 + list2
#list1 = list1 + list2
list1.extend(list2)
print(list1)
print(list1.index(“O”))
#index can take 2 other values:
## 1. start value-it will search in the string after this index
## 2. end value – search till this index
value_to_search = “L”
count = list1.count(value_to_search)
print(f“The indexes of {value_to_search} are: “,end=“”)
start_search = 0
for i in range(count):
    ind = list1.index(value_to_search,start_search)
    print(ind,end=”  “)
    start_search= ind + 1
print()
# reverse() – reverse the list  values
list1.reverse()
print(list1)
list1.sort()
print(list1)
#list1.reverse()
list1.sort(reverse = True)
print(list1)

Watch Day 13 Video Here

DAY 14 : AUGUST 28, 2022

#TUPLE
#Its immutable version of list
t1 =(3,5,7,9,11,3,5,7,9,3,5)
print(type(t1))
print(t1.count(7))
print(t1.index(5))

t1 = list(t1)
t1=()
t2=(2,3)
#just one value in tuple:
t3 = (3,)

if (23,54) > (23,54,99,89):
  print(23,54)

#unpacking
t1 = (3,5,7)
a,b,c = t1
print(a,b,c)


#Dictionary
dict1 = {9:“Sachin”, “Name”: “Rohit”, True : “Cricket”}
#key can be anything but they have to be unique
print(dict1[True])
temp = {5.6: “Mumbai”}
dict1.update(temp)

print(dict1)
#wap to input marks in 3 subjects and save under rollno
all_info = {}
for i in range(3):
  temp_dict ={}
  t_list = []
  rollno = int(input(“Enter the Roll No.: “))
  for j in range(3):
    m = int(input(“Enter the marks: “))
    t_list.append(m)
  temp_dict = {rollno: t_list}
  all_info.update(temp_dict)

print(“Marks of all students are: “,all_info)

all_info = {101: [76, 78, 69], 102: [98, 56, 71], 68: [52, 89, 88]}
print(all_info.keys())
print(all_info.values())
print(all_info.items())

for i,j in all_info.items():
  print(i,” : “,j)

Watch Day 14 Video Here

DAY 15 : AUGUST 29, 2022

#WAP where we input date, month and year in numbers

# and display as – date(st/nd/rd/th) Month_in_text Year

# eg. date = 25  month = 8 year = 2022

# output would be 25th August 2022

month_txt = [‘January’, ‘February’,‘March’,‘April’,‘May’,

             ‘June’, ‘July’,‘August’,‘Setember’,‘October’,

             ‘November’, ‘December’]

date_th = [‘st’,‘nd’,‘rd’] + 17*[‘th’] + [‘st’,‘nd’,‘rd’] +7*[‘th’] +[‘st’]

date = int(input(“Enter the Date: “))

month = int(input(“Enter the Month: “))

year = input(“Enter the Year: “)

result = str(date)+date_th[date-1]+” “ + month_txt[month-1]+” “ + year

print(result)

#Assignment: input marks of 5 subjects for 5 students and display

#the highest marks in each subject and also for overall and name the student

### Assignment – rewrite the below program by using list

#wap to arrange given 3 numbers in increasing order
#enter numbers as: 45, 75, 35 =>   35   45   75
a,b,c = 85, 75,95
l1,l2,l3 = a,a,a
if a < b: #when a is less than b
    if a<c: # a is less than b and a is less than cv [
        l1 = a
        if b<c:
            l2,l3=b,c
        else: #c is less than b
            l2,l3 = c,b
    else:  #a is less than b and greater than c [e.g. 3 5 2]
        l1,l2,l3=c,a,b

else:  #when b is less than a
    if b <c:
        l1 =b
        if a <c:
            l2,l3=a,c
        else:
            l2,l3 = c,a
    else:  # c <b
        l1,l2,l3 = c,b,a

print(f"{l1} <= {l2} <={l3}")

DAY 16: SEPTEMBER 3, 2022

#Dictionary
list1 = [4,5,6,7] #automatic position
dict1 = {"Harsh": 8.4,"Manish":8.12}
print(dict1["Harsh"])
#update to add another dictionary
dict2 = {"Sachin": 5.6, "Laxman": 7.2}
dict1.update(dict2)
print(dict1)
# FUT, QY, HY, SUT, FI (5%, 20%, 25%, 5%, 45%)
all_info ={}
for i in range(2): #2 students
    name = input("Enter Name: ")
    main_list=[]  #list of list
    for k in range(5):  # 5 types of exams
        t_list = []
        for j in range(3):
            marks = int(input("Enter marks: "))
            t_list.append(marks)

main_list.append(t_list)
t_dict = {name: main_list}
all_info.update(t_dict)

#{“Manish”: [[],[],[],[],[]]} – final output template
#Now we have the data

#to get all the keys:
keys = all_info.keys()
values = all_info.values()
items = all_info.items() # (key, value)

apply_std = [5, 20, 25, 5, 45]
#updated marks will be stored in another dictionary:
final_marks ={} #{name: []}
for k,v in all_info.items(): #[[5,7,8],[55,55,55],[66,66,66],[9,8,7],[88,88,88]]
updated_marks=[]
for i in range(3): #3 subjects ke liye
add = 0
for j in range(5): #5 exams
add = add + v[j][i] * apply_std[j]/100 #v[0][0] * apply[0] + v[1][0]* apply[1] +
updated_marks.append(add)
final_marks.update({k:updated_marks})

Day 16 Video Here

DAY 17: 11 SEP 2022

#Function
def mystatements():
    print("Hello")
    print("How are you doing?")
    print("Good morning")
def mystatements2(name,greeting): #required & positional
    print("Hello",name)
    print("How are you doing?")
    print(greeting)

def mystatements3(name, greeting="Good Morning"): #default & positional
    #name is required and greeting is default
    #required parameters are given before default
    print("Hello",name)
    print("How are you doing?")
    print(greeting)
    return 100

mystatements()
result = mystatements2("Sachin","Good Morning")
print(result)  #None is returned
result = mystatements3("Sachin")
print(result)

#function to take 2 numbers as input and
# perform add, sub,multiplication & division
# create - 2 functions: 1)required positional
# 2)default wheren numbers are 99 & 99
#return 4 answers as tuple

Day 17 Video

admin 0 Comments

Python Programming Uncategorized

Jul 21 2022

OSError errno22 invalid argument in Python

What is OSError?
OSError is the type of error in OSError : [errno22] invalid argument. OSError is an error class for the OS module. It is a built-in exception in python, which is raised. It is raised when the error occurs due to some system failure. I/O failures also give rise to OSErrors.

When the disk is full, or the file cannot be found, OSError is raised. The subclasses of OSError are BlockingIOError, ChildProcessError, ConnectionError, FileExistsError, FileNotFoundError, etc. OSError itself is derived from the EnvironmentError.

What is errorno22 invalid argument?
As the name suggests, invalid argument errors occur when an invalid argument is passed to a function. If a function was expecting an argument of a particular data type but instead received an argument of a different data type, it will throw an invalid argument error.

import tensorflow as tf
tf.reshape(1,2)

This code will raise invalid argument error. The tf.reshape() function was expecting a tensor as an argument. But instead, it received 1 and 2 as the argument.

‘OSError : [errno22] invalid argument’ while using read_csv()
Read_csv() is a function in pandas which is used to read a csv file in python. We can read a csv file by accessing it through a URL or even locally. While reading a csv file using read_csv, python can throw OSError : [errno22] invalid argument error.

Let us try to understand it with the help of an example. The below code has been executed in python shell to access local files. First, we shall import the pandas file to use read_csv()

import pandas as pd
file = read_csv(“C:\textfile.csv”)

The above line of code will raise the below error.

OSError: [Errno 22] Invalid argument: ‘C:\textfile.csv’
The reason behind the error is that python does not consider the backslash. Because of that, it showed oserror invalid argument. So what we have to do is that instead of a backslash, we have to replace it with a forwarding slash.

Correct method:
file = read_csv(“C:/textfile.csv”)

‘OSError : [errno22] invalid argument’ while using open()
We can get OSError : [errno22] invalid argument error while opening files with the open() function. The open() function in python is used for opening a file. It returns a file object. Thus, we can open the file in read, write, create or append mode.
Let us understand the error by taking an example. We shall try to open a .txt file in read mode using open(). The file would be returned as an object and saved in variable ‘f’.

f = open(“C:\textfile.txt”,”r”)

The code will throw the below error.

Traceback (most recent call last):
File “”, line 1, in
f = open(“C:\textfile.txt”,”r”)
OSError: [Errno 22] Invalid argument: ‘C:\textfile.
The OSError : [errno22] invalid argument error has been thrown because of the same reason as before. Here also, python fails to recognize the backslash symbol. On replacing backslash with forward slash, the error will be resolved.

Correct format:
f = open(“C:/textfile.txt”,”r”)

‘OSError : [errno22] invalid argument’ while reading image using open()
The above error can appear while opening an image using the open() function even though the backslash character has been replaced with forward slash. Let us see the error using an example.

image = open(“C:/image1.jpg”)

The error thrown would be:

Traceback (most recent call last):
File “”, line 1, in
image = open(“‪C:/image1.jpg”)
OSError: [Errno 22] Invalid argument: ‘\u202aC:/image1.jpg’
This error mainly occurs because of the copying of the file path. The Unicode characters also get copied sometimes when we copy the file path from our local system or the internet.

The Unicode character, ‘\u202a’ in the above example, is not visible in the file pathname. ‘\u202a’ is the Unicode control character from left to right embedding. So, it causes the above oserror invalid arguments.

The solution to this is straightforward. We simply have to type the URL manually instead of copying it. Thus, the Unicode character will no longer be in the URL and the error will be resolved.

What do you think? Please share in the comment section.

admin 0 Comments

Jul 9 2022

How to add a Machine Learning Project to GitHub

Maintaining a GitHub data science portfolio is very essential for data science professionals and students in their career. This will essentially showcase their skills and projects.

Steps to add an existing Machine Learning Project in GitHub

Step 1: Install GIT on your system

We will use the git command-line interface which can be downloaded from:

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

Step 2: Create GitHub account here:

https://github.com/

Step 3: Now we create a repository for our project. It’s always a good practice to initialize the project with a README file.

Step 4: Go to the Git folder located in Program Files\Git and open the git-bash terminal.

Step 5: Now navigate to the Machine Learning project folder using the following command.

cd PATH_TO_ML_PROJECT

Step 6: Type the following git initialization command to initialize the folder as a local git repository.

git init

We should get a message “Initialized empty Git repository in your path” and .git folder will be created which is hidden by default.

Step 7: Add files to the staging area for committing using this command which adds all the files and folders in your ML project folder.

git add .

Note: git add filename.extension can also be used to add individual files.

Step 8: We will now commit the file from the staging area and add a message to our commit. It is always a good practice to having meaningful commit messages which will help us understand the commits during future visits and revision. Type the following command for your first commit.

git commit -m "Initial project commit"

Step 9: This only adds our files to the local branch of our system and we have to link with our remote repository in GitHub. To link them go to the GitHub repository we have created earlier and copy the remote link under “..or push an existing repository from the command line”.

First, get the url of the github project:

Now, In the git-bash window, paste the command below followed by your remote repository’s URL.

git remote add origin YOUR_REMOTE_REPOSITORY_URL

Step 10: Finally, we have to push the local repository to the remote repository in GitHub

git push -u origin master

Sign into your github account

Authorize GitCredentialManager

After this, the Machine Learning project will be added to your GitHub with the files.

We have successfully added an existing Machine Learning Project to GitHub. Now is the time to create your GitHub portfolio by adding more projects to it.

admin 0 Comments

Jun 8 2022

ReEvolution Coffee Shop (Business Analysis Case Study)

Join the love revolution in re-Evolution Coffee houses!

Press release:

The next revolutionary step in the evolution of coffee house culture is the way you get your coffee. Re-Evolution Coffee shops chain MD Dirk Dash explains: “If you’d rather not queue up for your coffee at the counter, at re-Evolution Coffee Houses up and down the UK you will be able to just sit yourself down, relax and our service staff will be at your side ready to take your order, sending it instantly to the re-Evolution Coffee Wizards who will work their magic and notify the staff to deliver it to you – effortlessly… our service staff will be using the latest and best wireless technology to provide the ultimate coffee house customer experience, so relax and join the love revolution – but only in re-Evolution Coffee Houses!”

Email

From:

Dirk Dash (MD re-Evolution Coffee Houses)

to:

Adele Ash (Sales & Service Director)

Ben Bash (IT Director)

Carla Cash (Finance Director)

Subject: at seat service development & launch

Guys,

Check out the attached press release – as you know we are planning to launch in 3 months so it’s time to crack on and get this delivered.

This is a big venture for us, but it differentiates us from the competition and, if we get it right, we can get the lead in the market place…so there is a need for speed but given the likely costs and risks – money and reputation – we need to get it right first time or our shareholders will unimpressed.

As the management board, we just can’t afford another experience like the Streamline implementation.

Adele: you will need new staff and the existing ones are going to have to learn new ways of working with new technology – but this is all about customer service so make sure you let Ben know what the technology needs to do to enable your staff to deliver that service.

Ben: the wireless order taking and fulfilment capability is the backbone of this. The ‘at seat’ staff taking orders at seat can do the best job in the world but the Wizards must get the right orders in the right sequence at the right time or the ‘at seat’ staff can’t deliver excellent service.

Ben, Adele: you need to tell Carla what the indicative costs and timescales are going to be.

Carla: this initiative is key to company growth plans but not at any price, let me know as soon as you have some figures of costs and returns to show me.

This is a high risk, high reward opportunity – do it right!

Regards,

Dirk

Email

From:

Ben Bash (IT Director)

To:

Dirk Dash (MD re-Evolution Coffee Houses)

Adele Ash (Sales & Service Director)

Carla Cash (Finance Director)

Subject: Re: at seat service development & launch

Hi,

If we are going to avoid another Streamline fiasco, I suggest we start by defining exactly what it is that we’re changing in terms of systems and people. I have some business analysts already started on this to define the requirements, please make sure you make time for them.

Once we know what we’re doing here we can specify the technicalities of the wireless solution. At that point I can give some sensible cost estimates.

There is a lot we need to agree. For example one question I already have (and that makes a big difference to costs) is do we expect seated customers to be able to pay for their orders as they place them like those queuing at the counter do? We can do that but it means linking up to Streamline services via the order taking handheld and that will involve costs.

There are a shed load more questions like that which need to be answered before we can give Carla indicative costs.

Regards,

Ben

Email

From:

Adele Ash (Sales & Service Director)

To:

Dirk Dash (MD re-Evolution Coffee Houses)

Ben Bash (IT Director)

Carla Cash (Finance Director)

Subject: Re: at seat service development & launch

Hi,

Glad to hear about the business analysts, Ben. I don’t want to go through another change like StreamlineJ! We will make available as much time as needed.

Overall I can tell you how it will work right now: A customer comes in to a shop and takes a seat. The waiting staff will give them time to select what they want and then approach to take their order, coming back if necessary. While they are taking the customer order they will recommend the promotions we are running as the opportunity allows. Once they have got the order it needs to be sent to the Wizards who will make up the orders in the sequence they receive them interspersed with counter orders. Ben – can the counter orders and table orders be put on one list so the Wizards don’t have to sort through two lists? When the order is ready the Wizards need to be able to tell the waiting staff who will pick the order up from the counter and deliver.

One thing to factor in here is recruitment of new staff, and training new and existing staff. Suggest we go for a train the trainer approach and I will draw up plans accordingly.

Payment – my belief is payments will have to be made at the counter (if agreed we could look at setting up a separate payment queue for this). I think taking payment as customers place the order will take up too much time and over complicate things.

Maybe make that a next stage thing?

Regards,

Adele

From:

Carla Cash (Finance Director)

To:

Dirk Dash (MD re-Evolution Coffee Houses)

Adele Ash (Sales & Service Director)

Ben Bash (IT Director)

Subject: Re: at seat service development & launch

Hi,

Ben, Adele, give me costs and benefits whenever you can, caveat them however you like. We can start building a benefits case as soon as we have some figures and then refine them as we go.

By the way, we have an opportunity here to get some good management information – things like better shop performance figures and new information like average order fulfilment time, promotions uptakes and so on. It might even be worth tracking individual staff performance…is this going to be possible? The benefits would be much better reactive management of local issues and much better future planning by shop and for the whole chain…

Regards,

Carla

From:

Dirk Dash (MD re-Evolution Coffee Houses)

To:

Adele Ash (Sales & Service Director)

Ben Bash (IT Director)

Carla Cash (Finance Director)

Subject: Re: at seat service development & launch

Guys,

Great start!

I would like a presentation of how we see all this working in 3 weeks. Just show me the overall solution not the detail.

To answer some of the questions that have already come up:

Payment taking at seat is out – we’ll make it next stage as Adele suggests
Management information is in: you’re right Carla it’s a great opportunity!

Good work team, crack on!

Regards,

Dirk

admin 0 Comments

Jun 3 2022

Monte Carlo Simulation

Monte Carlo simulation is a computerized mathematical technique to generate random sample data based on some known distribution for numerical experiments. This method is applied to risk quantitative analysis and decision making problems. This method is used by the professionals of various profiles such as finance, project management, energy, manufacturing, engineering, research & development, insurance, oil & gas, transportation, etc.

This method was first used by scientists working on the atom bomb in 1940. This method can be used in those situations where we need to make an estimate and uncertain decisions such as weather forecast predictions.

The Monte Carlo Simulation Formula

We would like to accurately estimate the probabilities of uncertain events. For example, what is the probability that a new product’s cash flows will have a positive net present value (NPV)? What is the risk factor of our investment portfolio? Monte Carlo simulation enables us to model situations that present uncertainty and then play them out on a computer thousands of times.

Many companies use Monte Carlo simulation as an important part of their decision-making process. Here are some examples.

General Motors, Proctor and Gamble, Pfizer, Bristol-Myers Squibb, and Eli Lilly use simulation to estimate both the average return and the risk factor of new products. At GM, this information is used by the CEO to determine which products come to market.
GM uses simulation for activities such as forecasting net income for the corporation, predicting structural and purchasing costs, and determining its susceptibility to different kinds of risk (such as interest rate changes and exchange rate fluctuations).
Lilly uses simulation to determine the optimal plant capacity for each drug.
Proctor and Gamble uses simulation to model and optimally hedge foreign exchange risk.
Sears uses simulation to determine how many units of each product line should be ordered from suppliers—for example, the number of pairs of Dockers trousers that should be ordered this year.
Oil and drug companies use simulation to value “real options,” such as the value of an option to expand, contract, or postpone a project.
Financial planners use Monte Carlo simulation to determine optimal investment strategies for their clients’ retirement.

Download Excel:

MonteCarlo-Simulation Download

admin 1 Comments

May 26 2022

Exponential Smoothing Forecasting – Examples

Example 1:

Exponential Smoothing Forecasting – Example

Let’s consider α=0.2 for the above-given data values so enter the value 0.8 in the Damping Factor box and again repeat the Exponential

The result is shown below:

Exponential Smoothing Forecasting – Example #2

Let’s consider α=0.8 for the above-given data values so enter the value 0.2 in the Damping Factor box and again repeat the Exponential Smoothing method.

The result is shown below:

Now, if we compare the results of all the above 3 Excel Exponential Smoothing examples, then we can come up with the below conclusion:

The Alpha α value is smaller; the damping factor is higher. Resultant the more the peaks and valleys are smoothed out.
The Alpha α value is higher; the damping factor is smaller. Resultant the smoothed values are closer to the actual data points.

Things to Remember

The more value of the dumping factor smooths out the peak and valleys in the dataset.
Excel Exponential Smoothing is a very flexible method to use and easy in the calculation.

admin 0 Comments

May 14 2022

Multiple Linear Regression Code

Multiple linear regression (MLR) is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The goal of multiple linear regression (MLR) is to model the relationship between the explanatory and response variables.

The model for MLR, given n observations, is:

Let’s take an example:

The dataset has 5 columns which contains extract from the Profit and Loss statement of 50 start up companies. This tells about the companies R&D, Admin and Marketing spend, the state in which these companies are based and also profit that the companies realized in that year. A venture capitalist (VC) would be interested in such a data and would to see if factors like R&D Spend, Admin expenses, Marketing spend and State has any role to play on the profitability of a startup. This analysis would help VC to make investment decisions in future.

Profit is the dependent variable and other variables are independent variables.

Dummy Variables

Let’s look at the dataset we have for this example:

One challenge we would face while building the linear model is on handling the State variable. State column has a categorical value and can not be treated as like any other numeric value. We need to add dummy variables for each categorical value like below:

Add 3 columns for each categorical value of state. Add 1 to the column where row value of state matches to the column header. Row containing New York will have 1 against the column header New York and rest of the values in that column will be zero. Similarly, we need to modify California and Florida columns too. Three additional columns that we added are called dummy variables and these will be used in our model building. State column can be ignored. We can also ignore New York column from analysis because row which has zero under California and Florida implicitly implies New York will have a value of 1. We always use 1 less dummy variable compared to total factors to avoid dummy variable trap.

Python code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#Dataset
data_df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”)
#Getting X and Y values
X = data_df.iloc[:, :-1].values
y = data_df.iloc[:, -1].values

#Encoding the categorical variables:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
#Change the text into numbers 0,1,2 – 4th column
X[: ,3]= labelencoder_X.fit_transform(X[: ,3])
#create dummy variables
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(), [3])],remainder=‘passthrough’)
#Now a little fit and transform
X = np.array(transformer.fit_transform(X), dtype=np.float)
#4 Avoid the dummy variables trap
#Delete the first column represent the New York
X= X[:, 1:]

#Split into training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#Train the Algorithm
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
#The y_pred is a numpy array that contains all predicted values
#compare actual output values for X_test with predicted values
output_df = pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(“Actual v Predicted: \n“,output_df)
####
import numpy as np
from sklearn import metrics
explained_variance=metrics.explained_variance_score(y_test, y_pred)
mean_absolute_error=metrics.mean_absolute_error(y_test, y_pred)
mse=metrics.mean_squared_error(y_test, y_pred)
mean_squared_log_error=metrics.mean_squared_log_error(y_test, y_pred)
median_absolute_error=metrics.median_absolute_error(y_test, y_pred)
r2=metrics.r2_score(y_test, y_pred)
print(‘Explained_variance: ‘, round(explained_variance,2))
print(‘Mean_Squared_Log_Error: ‘, round(mean_squared_log_error,2))
print(‘R-squared: ‘, round(r2,4))
print(‘Mean Absolute Error(MAE): ‘, round(mean_absolute_error,2))
print(‘Mean Squared Error (MSE): ‘, round(mse,2))
print(‘Root Mean Squared Error (RMSE): ‘, round(np.sqrt(mse),2))
from statsmodels.api import OLS
import statsmodels.api as sm
#In our model, y will be dependent on 2 values: coefficienct
# and constant, so we need to add additional column in X for
#constant value
X = sm.add_constant(X)
summ = OLS(y, X).fit().summary()
print(“Summary of the dataset: \n“,summ)

Output:

In above table, x1 and x2 are the dummy variables for state, x3 is R&D, x4 is Administration, x5 is the marketing spends.

How many independent variables to consider?

We need to be careful to choose which ones we need to keep for input variables. We do not want to include all the variables for mainly 2 reasons:

GIGO: If we feed garbage to our model we will get garbage out so we need to feed in right set of data
Justifying the input: Can we justify the inclusion of all the data, if no, then we should not include them.

There are 4 methods to build a multiple linear model:

Select all in
Backward Elimination
Forward Selection
Bidirectional Elimination

Select-all-in: We select all the independent variables because we know that all variables impact the result or you have to because business leaders want you to include them.

Backward Elimination:

Select a significance level to stay in the model (e..g. SL =0.05, higher P value to be removed)
Fit the full model with all possible predictors.
Consider the predictor with the highest P-value. If P>SL, go to step 4 otherwise goto 5
Remove the predictor and refit the model and Go to step 3
Your model is ready!

Forward Selection:

Select a significance level to stay in the model (e..g. SL =0.05, lower P value to be kept)
Fit all the simple regression models, Select the one with the lowest P-value.
Keep this variable and fit all possible models with one extra predictor added to the ones you already have. Now Run with 2 variable linear regressions.
Consider the predictor with the lowest P-value. If P<SL, go to Step 3, otherwise go to next step.
Keep the previous model!

Bi-directional Selection: It is a combination of Forward selection and backward elimination:

Select a significant level to enter and stay in the model (SLE = SLS = 0.05)
Perform the next step of Forward selection (new variables must have P<SLE)
Perform all the step of Backward elimination (old variables must have P<SLS)
Iterate between 2 & 3 till no new variables can enter and no old variables can exit.

In the multiple regression example since we have already executed with all the attributes, let’s implement backward elimination method here and remoe out the attributes that are not useful for us. Let’ have a relook at the stats summary:

Look at the highest p-values and remove it. In this condition x2 (second dummy variable has the highest one (0,990). Now, we will remove this variable from the X and re-run the model.

X_opt= X[:, [0,1,3,4,5]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ =regressor_OLS.summary()
print(“Summary of the dataset after elimination 1: \n“,summ)

Output Snapshot:

Look at the highest p-value again. #First dummy variable, x1’s p-value is 0.940. Remove this one. Even though this appeared as high number in the previous step also, but as per the algorithm we need to remove only 1 value at a time. Since, removing an attribute can have impact on other attributes also. Re-run the code again:

X_opt= X[:, [0,3,4,5]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ = regressor_OLS.summary()
print(“Summary of the dataset after elimination 2: \n“,summ)

Admin spends (x2) has the highest p-value (0.602). Remove this as well.

X_opt= X[:, [0,3,5]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ = regressor_OLS.summary()
print(“Summary of the dataset after elimination 3: \n“,summ)

Admin spends (x2) has the highest p-value (0.06). This value is low but since we have selected the significance level (SL) as 0.05, we need to remove this as well.

X_opt= X[:, [0,3]]
regressor_OLS=sm.OLS(endog = y, exog = X_opt).fit()
summ =regressor_OLS.summary()
print(“Summary of the dataset after elimination 3: \n“,summ)

Finally, we see that only one factor has the significant impact on the profit. The highest impact variable is R&D spendings on profit of these startups. The accuracy of the model has also increased. When we included all the attributes, the R squared value was 0.9347 and now its at 0.947.

The word “linear” in “multiple linear regression” refers to the fact that the model meets all the criteria discussed in the next section.

Test for Linearity

The next question we need to understand is when can we perform or not perform Linear Regression. In this section, let’s understand the assumptions of linear regression in detail. One of the most essential steps to take before applying linear regression and depending solely on accuracy scores is to check for these assumptions and only when a dataset meet these assumptions, we say that dataset can be used for linear regression model.

For the analysis, we will take the same dataset, we used for Multiple Linear Regression Analysis in the previous section.

import pandas as pd
#Dataset
data_df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”)

Before we apply regression on all these attributes, we need to understand if we need to really take all of these attributes into consideration. There are two things we need to consider:

First step is to test the dataset if its fits into the linearity definition, which we will perform different tests in this section. Remember, we only test for numerical columns as the categorical columns are not taken into account. As we know that the categorical values are converted into dummy variables of values 0 and 1, dummy variables meet the assumption of linearity by definition, because they creat two data points, and two points define a straight line. There is no such thing as a non-linear relationship for a single variable with only two values.

Code for Prediction: Let’s rewrite the code

import numpy as np
X = data_df.iloc[:,:-1].values
y = data_df.iloc[:,-1].values

#handling categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
le_x = LabelEncoder()
X[:,3] = le_x.fit_transform(X[:,3])
from sklearn.compose import ColumnTransformer
tranformer = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[3])], remainder=‘passthrough’)
X = np.array(tranformer.fit_transform(X), dtype=np.float)
X=X[:,1:]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train , y_train)
y_pred = regressor.predict(X_test)
output_df = pd.DataFrame({“Actual”:y_test, “Predicted”: y_pred})

Let’s perform the tests now:

1. Linearity

Linear regression needs the relationship between the independent and dependent variables to be linear. Let’s use a pair plot to check the relation of independent variables with the profit variable.

Output:

Python Code:

import seaborn as sns
import matplotlib.pyplot as plt

# visualize the relationship between the features and the response using scatterplots
p = sns.pairplot(data_df, x_vars=[‘R&D Spend’,‘Administration’,‘Marketing Spend’], y_vars=‘Profit’, height=5, aspect=0.7)
plt.show()

By looking at the plots we can see that with the R&D Spend form an accurately linear shape and Marketing Spend is somewhat in the linear shape but Administration Spend is all over the graph but still shows increasing trend as Profit value increases on Y-Axis. Here we can use Linear Regression models.

2. Variables follow a Normal Distribution

The variables (X) follow a normal distribution. In order words, we want to make sure that for each x value, y is a random variable following a normal distribution and its mean lies on the regression line. One of the ways to visually test for this assumption is through the use of the Q-Q-Plot. Q-Q stands for Quantile-Quantile plot and is a technique to compare two probability distributions in a visual manner. To generate this Q-Q plot we will be using scipy’s probplot function where we compare a variable of our chosen to a normal probability.

import scipy.stats as stats
stats.probplot(X[:,3], dist=“norm”, plot=plt)
plt.show()

The points must lie on this red line to conclude that it follows a normal distribution. In this case of selecting 3rd column which is R&D Spend, yes it does! A couple of points outside of the line is due to our small sample size. In practice, you decide how strict you want to be as it is a visual test.

3. There is no or little multicollinearity

Multicollinearity means that the independent variables are highly correlated with each other. X’s are called independent variables for a reason. If multicollinearity exists between them, they are no longer independent and this generates issues when modeling linear regressions.

To visually test for multicollinearity we can use the power of Pandas. We will use Pandas corr function to compute the pairwise correlation of our columns. If you find any values in which the absolute value of their correlation is >=0.8, the multicollinearity assumption is being broken.

#convert to a pandas dataframe
import pandas as pd
df = pd.DataFrame(X)
df.columns = [‘x1’,‘x2’,‘x3’,‘x4’,‘x5’]
#generate correlation matrix
corr = df.corr() #Plot HeatMap
p=sns.heatmap(df.corr(), annot=True,cmap=‘RdYlGn’,square=True)
print(“Corelation Matrix:\n“,corr)

4. Check for Homoscedasticity: The data are needs to be homoscedastic (meaning the residuals are equal across the regression line). Homoscedasticity means that the residuals have equal or almost equal variance across the regression line. By plotting the error terms with predicted terms we can check that there should not be any pattern in the error terms.

#produce regression plots
from statsmodels.api import OLS
import statsmodels.api as sm
X = sm.add_constant(X)
model = OLS(y, X).fit()
summ = model.summary()
print(“Summary of the dataset: \n“,summ)
fig = plt.figure(figsize=(12,8))
#Checking for x3 (R&D Spend)
fig = sm.graphics.plot_regress_exog(model, ‘x3’, fig=fig)
plt.show()

Four plots are produced. The one in the top right corner is the residual vs. fitted plot. The x-axis on this plot shows the actual values for the predictor variable points and the y-axis shows the residual for that value. Since the residuals appear to be randomly scattered around zero, this is an indication that heteroscedasticity is not a problem with the predictor variable x3 (R&D Spend). Multiple Regression, we need to create this plot for each of the predictor variable.

5. Mean of Residuals

Residuals as we know are the differences between the true value and the predicted value. One of the assumptions of linear regression is that the mean of the residuals should be zero. So let’s find out.

residuals = y_test-y_pred
mean_residuals = np.mean(residuals)
print(“Mean of Residuals {}”.format(mean_residuals))

Output:

Mean of Residuals 3952.010244810798

6. Check for Normality of error terms/residuals

p = sns.distplot(residuals,kde=True)
p = plt.title(‘Normality of error terms/residuals’)
plt.show()

The residual terms are pretty much normally distributed for the number of test points we took.

admin 0 Comments

Uncategorized

May 13 2022

DataSets

Credit Card dataset for R Project : Download from here and unzip for the CSV file
Attitude_Survey.csv :

attitude_survey Download

admin 0 Comments

May 8 2022

Simple Linear Regression or Bivariate Regression Problems to Practice

Download working excel from here :

Bivariate-12-May-2022 Download

Problem Statement 1:

Solution:

Step 1: Identify

Step 2: Compute (Manually):

Step 3: Interpret

Step 4: Assess

Solve using:

Solve using R Program:

Solve using Python

admin 0 Comments

NOTE: from statsmodels.tsa.arima_model import ARIMA is no longer used, instead use:statsmodels.tsa.arima.model.ARIMAPredict() is no longer used, instead use forecast()Predict was from the initial period but forecast takes the period in future

Machine Learning with R

The Monte Carlo Simulation Formula

Exponential Smoothing Forecasting – Example

Exponential Smoothing Forecasting – Example #2

Things to Remember

About Us

Quick Links

Services

Contact Us

NOTE: from statsmodels.tsa.arima_model import ARIMA is no longer used, instead use:

statsmodels.tsa.arima.model.ARIMA

Predict() is no longer used, instead use forecast()
Predict was from the initial period but forecast takes the period in future