Swapnil Saurav

File Handling in Python

A. Reading from and Writing to Files:

Reading from Files (open() and read()):

  • To read from a file, you can use the open() function in Python, which opens a file and returns a file object. The read() method is used to read the contents of the file.
  • Syntax for Reading:

python

# Reading from a file file = open(‘file.txt’, ‘r’) # Opens the file in read mode (‘r’) content = file.read() # Reads the entire file content print(content) file.close() # Close the file after reading

Writing to Files (open() and write()):

  • To write to a file, open it with the appropriate mode (‘w’ for write, ‘a’ for append). The write() method is used to write content to the file.
  • Syntax for Writing:

python

# Writing to a file file = open(‘file.txt’, ‘w’) # Opens the file in write mode (‘w’) file.write(‘Hello, World!\n’) # Writes content to the file file.close() # Close the file after writing

B. File Modes and Operations:

File Modes:

  • Read Mode (‘r’): Opens a file for reading. Raises an error if the file does not exist.
  • Write Mode (‘w’): Opens a file for writing. Creates a new file if it doesn’t exist or truncates the file if it exists.
  • Append Mode (‘a’): Opens a file for appending new content. Creates a new file if it doesn’t exist.
  • Read and Write Mode (‘r+’): Opens a file for both reading and writing.
  • Binary Mode (‘b’): Used in conjunction with other modes (e.g., ‘rb’, ‘wb’) to handle binary files.

File Operations:

  • read(): Reads the entire content of the file or a specified number of bytes.
  • readline(): Reads a single line from the file.
  • readlines(): Reads all the lines of a file and returns a list.
  • write(): Writes content to the file.
  • close(): Closes the file when finished with file operations.

Using with Statement (Context Manager):

  • The with statement in Python is used to automatically close the file when the block of code is exited. It’s a good practice to use it to ensure proper file handling.
  • Syntax:

python

with open(‘file.txt’, ‘r’) as file: content = file.read() print(content) # File is automatically closed outside the ‘with’ block

VII. Object-Oriented Programming (OOP) Basics

A. Classes and Objects:

Classes:

  • Classes are blueprints for creating objects in Python. They encapsulate data (attributes) and behaviors (methods) into a single unit.
  • Syntax for Class Declaration:

python

# Class declaration class MyClass: # Class constructor (initializer) def __init__(self, attribute1, attribute2): self.attribute1 = attribute1 self.attribute2 = attribute2 # Class method def my_method(self): return “This is a method in MyClass”

Objects:

  • Objects are instances of classes. They represent real-world entities and have attributes and behaviors defined by the class.
  • Creating Objects from a Class:

python

# Creating an object of MyClass obj = MyClass(“value1”, “value2”)

B. Inheritance and Polymorphism:

Inheritance:

  • Inheritance allows a class (subclass/child class) to inherit attributes and methods from another class (superclass/parent class).
  • Syntax for Inheritance:

python

# Parent class class Animal: def sound(self): return “Some sound” # Child class inheriting from Animal class Dog(Animal): def sound(self): # Overriding the method return “Woof!”

Polymorphism:

  • Polymorphism allows objects of different classes to be treated as objects of a common superclass. It enables the same method name to behave differently for each class.
  • Example of Polymorphism:

python

# Polymorphism example def animal_sound(animal): return animal.sound() # Same method name, different behaviors # Creating instances of classes animal1 = Animal() dog = Dog() # Calling the function with different objects print(animal_sound(animal1)) # Output: “Some sound” print(animal_sound(dog)) # Output: “Woof!”

Error Handling (Exceptions) in python programming

A. Understanding Exceptions:

What are Exceptions?

  • Exceptions are errors that occur during the execution of a program, disrupting the normal flow of the code.
  • Examples include dividing by zero, trying to access an undefined variable, or attempting to open a non-existent file.

Types of Exceptions:

  • Python has built-in exception types that represent different errors that can occur during program execution, like ZeroDivisionError, NameError, FileNotFoundError, etc.

B. Using Try-Except Blocks:

Handling Exceptions with Try-Except Blocks:

  • Try-except blocks in Python provide a way to handle exceptions gracefully, preventing the program from crashing when errors occur.
  • Syntax:

python

try: # Code that might raise an exception result = 10 / 0 # Example: Division by zero except ExceptionType as e: # Code to handle the exception print(“An exception occurred:”, e)

Handling Specific Exceptions:

  • You can catch specific exceptions by specifying the exception type after the except keyword.
  • Example:

python

try: file = open(‘nonexistent_file.txt’, ‘r’) except FileNotFoundError as e: print(“File not found:”, e)

Using Multiple Except Blocks:

  • You can use multiple except blocks to handle different types of exceptions separately.
  • Example:

python

try: result = 10 / 0 except ZeroDivisionError as e: print(“Division by zero error:”, e) except Exception as e: print(“An exception occurred:”, e)

Handling Exceptions with Else and Finally:

  • The else block runs if no exceptions are raised in the try block, while the finally block always runs, whether an exception is raised or not.
  • Example:

python

try: result = 10 / 2 except ZeroDivisionError as e: print(“Division by zero error:”, e) else: print(“No exceptions occurred!”) finally: print(“Finally block always executes”)

IX. Introduction to Python Libraries

A. Overview of Popular Libraries:

  1. NumPy:
    1. Description: NumPy is a fundamental package for scientific computing in Python. It provides support for arrays, matrices, and mathematical functions to operate on these data structures efficiently.
    1. Key Features:
      1. Multi-dimensional arrays and matrices.
      1. Mathematical functions for array manipulation.
      1. Linear algebra, Fourier transforms, and random number capabilities.
    1. Example:

python

import numpy as np # Creating a NumPy array arr = np.array([1, 2, 3, 4, 5])

  • Pandas:
    • Description: Pandas is a powerful library for data manipulation and analysis. It provides data structures like Series and DataFrame, making it easy to handle structured data.
    • Key Features:
      • Data manipulation tools for reading, writing, and analyzing data.
      • Data alignment, indexing, and handling missing data.
      • Time-series functionality.
    • Example:

python

import pandas as pd # Creating a DataFrame data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’], ‘Age’: [25, 30, 35]} df = pd.DataFrame(data)

  • Matplotlib:
    • Description: Matplotlib is a comprehensive library for creating static, interactive, and animated visualizations in Python. It provides functionalities to visualize data in various formats.
    • Key Features:
      • Plotting 2D and 3D graphs, histograms, scatter plots, etc.
      • Customizable visualizations.
      • Integration with Jupyter Notebook for interactive plotting.
    • Example:

python

import matplotlib.pyplot as plt # Plotting a simple line graph x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y) plt.xlabel(‘X-axis’) plt.ylabel(‘Y-axis’) plt.title(‘Simple Line Graph’) plt.show()

B. Installing and Importing Libraries:

Installing Libraries using pip:

  • Open a terminal or command prompt and use the following command to install libraries:

pip install numpy pandas matplotlib

Importing Libraries in Python:

  • Once installed, import the libraries in your Python script using import statements:

Python import numpy as np import pandas as pd import matplotlib.pyplot as plt

  • After importing, you can use the functionalities provided by these libraries in your Python code.

X. Real-life Examples and Projects

A. Simple Projects for Practice:

  1. To-Do List Application:
    1. Create a command-line to-do list application that allows users to add tasks, mark them as completed, delete tasks, and display the list.
  2. Temperature Converter:
    1. Build a program that converts temperatures between Celsius and Fahrenheit or other temperature scales.
  3. Web Scraper:
    1. Develop a web scraper that extracts information from a website and stores it in a structured format like a CSV file.
  4. Simple Calculator:
    1. Create a basic calculator that performs arithmetic operations such as addition, subtraction, multiplication, and division.
  5. Hangman Game:
    1. Implement a command-line version of the Hangman game where players guess letters to reveal a hidden word.
  6. Address Book:
    1. Develop an address book application that stores contacts with details like name, phone number, and email address.
  7. File Organizer:
    1. Write a script that organizes files in a directory based on their file extensions or other criteria.

B. Exploring Python’s Applications in Different Fields:

  1. Web Development (Django, Flask):
    1. Python is widely used for web development. Explore frameworks like Django or Flask to build web applications, REST APIs, or dynamic websites.
  2. Data Science and Machine Learning:
    1. Use libraries like NumPy, Pandas, Scikit-learn, or TensorFlow to perform data analysis, create machine learning models, or work on predictive analytics projects.
  3. Scientific Computing:
    1. Python is used extensively in scientific computing for simulations, modeling, and solving complex mathematical problems. Use libraries like SciPy or SymPy for scientific computations.
  4. Natural Language Processing (NLP):
    1. Explore NLP with Python using libraries like NLTK or spaCy for text processing, sentiment analysis, or language translation tasks.
  5. Game Development:
    1. Develop simple games using Python libraries like Pygame, allowing you to create 2D games and learn game development concepts.
  6. Automation and Scripting:
    1. Create scripts to automate repetitive tasks like file manipulation, data processing, or system administration using Python’s scripting capabilities.
  7. IoT (Internet of Things) and Raspberry Pi Projects:
    1. Experiment with Python for IoT projects by controlling sensors, actuators, or devices using Raspberry Pi and Python libraries like GPIO Zero.

XI. Conclusion

A. Recap of Key Points:

  1. Python Basics: Python is a high-level, versatile programming language known for its simplicity, readability, and vast ecosystem of libraries and frameworks.
  2. Core Concepts: Understanding Python’s syntax, data types, control structures, functions, and handling exceptions is crucial for effective programming.
  3. Popular Libraries: Libraries like NumPy, Pandas, Matplotlib, etc., offer specialized functionalities for data manipulation, scientific computing, visualization, and more.
  4. Project Ideas: Simple projects, such as to-do lists, calculators, web scrapers, etc., provide practical experience and reinforce learning.
  5. Real-world Applications: Python’s applications span diverse fields like web development, data science, machine learning, scientific computing, automation, IoT, and more.

B. Encouragement for Further Exploration:

  1. Continuous Learning: Python’s versatility and vast ecosystem offer endless opportunities for learning and growth.
  2. Practice and Projects: Build upon your knowledge by working on more complex projects, contributing to open-source, and experimenting with different libraries and domains.
  3. Community Engagement: Engage with the Python community through forums, meetups, conferences, and online platforms to learn, share experiences, and collaborate.
  4. Stay Curious: Python evolves continuously, and exploring new libraries, updates, or trends keeps your skills up-to-date and opens doors to new possibilities.
  5. Persistence: Embrace challenges as learning opportunities. Persistence and dedication in learning Python will yield rewarding results in the long run.

C. Final Thoughts:

Python is an exceptional programming language renowned for its simplicity, readability, and versatility. Its applications span across numerous fields, from web development to scientific computing, data analysis, machine learning, and beyond. Whether you’re a beginner starting your programming journey or an experienced developer seeking new avenues, Python offers a rich ecosystem and a supportive community to aid your exploration and growth.

PYTHON DECEMBER 2023

DAY 1 - INSTALLATION

”’
Python Installation link:
https://www.digitalocean.com/community/tutorials/install-python-windows-10

Python is an interpreted language
”’
print(5*4,end=” and “);  # will evaluate
print(‘5*4’) #will print as it is
print(“5*6”);
print(“5*6=”,\n+str(5*6)) # functions have arguments- they are separated by ,
print(“20”+“30”,20+30,20,30)
print(“5*6=”+str(5*6))

# This isn’t right!
print(“This isn’t right!”)
# He asked,”What’s your name”?
print(”’He asked,”What’s your name”?”’)
print(“””He asked,”What’s your name”?”””)
print(‘This isn\’t right!’)
print(“He asked,\”What\’s your name\”?”)

# \ – is called as ESCAPE SEQUENCE
# \ will add or remove power from you
print(\\n is used for newline in Python”)
print(\\\\n will result in \\n”)
print(   r”\\n will result in \n”      # regular expression
print(“HELLO”);print(“HI”)

## datatypes
#numeric: integer (int), float (float), complex (complex)
#text: string (str) – ‘   ”   ”’    “””
#boolean: boolean(bool) – True and False
x = 1275 # let x = 5
y = 6
print(x+y)
print(type(x))

 

 

 

 

# basic data types:
var1 = 5
print(type(var1)) #<class ‘int’>

var1 = 5.0
print(type(var1)) #<class ‘float’>

var1 = “5.0”
print(type(var1)) #<class ‘str’>

var1 = “””5.0″””
print(type(var1)) #<class ‘str’>

var1 = True
print(type(var1)) #<class ‘bool’>

var1 = 5j
print(type(var1)) #<class ‘complex’>

length = 100
breadth = 15
area = length * breadth
peri = 2*(length + breadth)
print(“Area of a rectangle with length”,length,“and breadth”,breadth,“is”,area,“and perimeter is”,peri)
# f-string
print(f”Area of a rectangle with length {length} and breadth {breadth} is {area} and perimeter is {peri})
print(f”Area of a rectangle with length {length} and breadth {breadth} is {area} and perimeter is {peri})

# float value
tot_items= 77
tot_price = 367
price_item =tot_price/tot_items
print(f”Cost of each item when total price paid is {tot_price} for {tot_items} items is {price_item:.1f} currency”)

”’
Assignment submission process:
1. Create Google drive folder: share with the instructor
2. within this folder – add your .py files
”’
”’
Assignment 1:
1. Write a program to calculate area and circumference of a circle and display info in a formatted manner
2. WAP to calculate area and perimeter of a square
3. WAP to calculate simple interest to be paid when principle amount, rate of interest and time is given
4. WAP to take degree celcius as input and give Fahrenheit output
”’

name, country,position=“Virat”,“India”,“Opening”
print(f”Player {name:<10} plays for {country:>12} as a/an {position:^15} in the cricket.”)
name, country,position=“Mangwaba”,“Zimbabwe”,“Wicket-keeper”
print(f”Player {name:<10} plays for {country:>12} as a/an {position:^15} in the cricket.”)

# operators: arithematic operators
# -5 * -5 = 25
print(5j * 5j) # -25 +0j
val1, val2 = 10,3
print(val1 + val2)
print(val1 – val2)
print(val1 * val2)
print(val1 / val2) #3.333 – always be float

print(val1 % val2) # modulo (%) – remainder
print(val1 // val2) #integer division (non decimal)
print(val1 ** val2) # power() 10**3 = 10*10*10

# comparison operators


# complex numbers are square root negative numbers
# square root of 25 -> 5
# square root of -25? 25 * -1 = 5j
# Comparison operators – compare the values
# asking, is …
# your output is always a bool value – True or False
val1,val2,val3 = 20,20,10
print(val1 > val2) #val1 greater than val2 ?
print(val1 >= val2)
print(val1 > val3) #val1 greater than val3 ?
print(val1 >= val3) # True
print(“Second set:”)
print(val1 < val2) #F
print(val1 <= val2) #T
print(val1 < val3) #F
print(val1 <= val3) #F
print(“third set:”)
print(val1 == val2) # T
print(val2==val3) # F

print(val1 != val2) # F
print(val2!=val3) # T
”’
a = 5 # assign value 5 to the variable a
a ==5 # is the value of a 5?
a!=5 # is value of a not equal to 5 ?
”’
## Logical operators: and or not
”’
Committment: I am going to cover Python and SQL in this course

Actual 1: I covered Python and SQL
Actual 2: I covered SQL
Actual 3: I covered Python

Committment 2: I am going to cover Python or SQL in this course

Actual 1: I covered Python and SQL
Actual 2: I covered SQL
Actual 3: I covered Python
”’
#logical operators takes bool values as input and also output is another bool
print(True and True ) # T
print(False and True ) #F
print(True and False ) #F
print(False and False ) #F
print(“OR:”)
print(True or True ) # T
print(False or True ) #T
print(True or False ) #T
print(False or False ) #F
print(“NOT”)
print(not True)
print(not False)
val1,val2,val3 = 20,20,10
print(val1 > val2 and val1 >= val2 or val1 > val3 and val1 >= val3 or val1 < val2 and val2!=val3)
# F and T or T and T or F and T
# F or T or F
# T
# Self Practice: output is True – solve it manually
print(val1 <= val2 or val1 < val3 and val1 <= val3 and val1 == val2 or val2==val3 or val1 != val2)

# Bitwise operator : & | >> <<
print(bin(50)) #bin() convert into binary numbers
# 50 = 0b 110010
print(int(0b110010)) #int() will convert into decimal number
print(oct(50)) # Octal number system: 0o62
print(hex(50)) #hexadecimal: 0x32

# Assignments (3 programs) – refer below
# bitwise operators
num1 = 50 # 0b 110010
num2 = 25 # 0b 011001
print(bin(50))
print(bin(25))
”’
110010
011001
111011 (|)
010000 (&)
”’
print(int(0b111011)) #bitwise | result = 59
print(int(0b10000)) #bitwise & result = 16
print(50&25)
print(50|25)
”’
Left Shift:
110010 << 1 = 1100100
Right Shift:
110010 >> 1 = 11001
”’
print(50<<2) # 50*2*2 : 110010 0 0
print(int(0b11001000))
print(50>>2) # 50 /2 /2
print(int(0b1100))
# input() – to read values from the user
a = input(‘Enter the value for length:’)
print(a)
print(type(a))
a = int(a)
print(type(a))
# int(), float(), str(), complex(), bool()
b = int(input(“Enter the value for breadth:”))
area = a*b
print(“Area of the rectangle is”,area)
total_marks = 150

if total_marks>=200:
print(“Congratulations! You have passed the exam”)
print(“You have 7 days to reserve your admission”)
else:
print(“Sorry, You have not cleared the exam”)
print(“Try again after 3 months”)

print(“Thank you”)
#
marks = 75
”’
>=85: Grade A
>=75: B
>=60: C
>=50: D
<50: E
”’
if marks>=85:
print(“Grade A”)
elif marks>=75:
print(“Grade B”)
elif marks>=60:
print(“Grade C”)
elif marks>=50:
print(“Grade D”)
else:
print(“Grade E”)

print(“Done”)
###.
marks = 85
”’
>=85: Grade A
>=75: B
>=60: C
>=50: D
<50: E
”’
if marks>=85:
print(“Grade A”)

if marks>=75 and marks<85:
print(“Grade B”)
if marks>=60 and marks<75:
print(“Grade C”)
if marks>=50 and marks<60:
print(“Grade D”)
if marks<50:
print(“Grade E”)

print(“Done”)
### NEST IF
marks = 98.0001
”’
>=85: Grade A
>=75: B
>=60: C
>=50: D
<50: E
>90: award them with medal
”’
if marks>=85:
print(“Grade A”)
if marks >= 90:
print(“You win the medal”)
if marks>98:
print(“Your photo will be on the wall of fame”)
elif marks>=75:
print(“Grade B”)
elif marks>=60:
print(“Grade C”)
elif marks>=50:
print(“Grade D”)
else:
print(“Grade E”)

print(“Done”)
”’
Practice basic programs from here:
https://www.scribd.com/document/669472691/Flowchart-and-C-Programs
”’

# check if a number is odd or even
num1 = int(input(“Enter the number: “))
if num1<0:
print(“Its neither Odd or Even”)
else:
if num1%2==0:
print(“Its Even”)
else:
print(“Its Odd”)

## check the greater of the given two numbers:

num1, num2 = 20,20
if num1>num2:
print(f”{num1} is greater than {num2})
elif num2>num1:
print(f”{num2} is greater than {num1})
else:
print(“They are equal”)

## check the greater of the given three numbers:

num1, num2,num3 = 29,49,29
if num1>num2: # n1 > n2
if num1>num3:
print(f”{num1} is greater”)
else:
print(f”{num3} is greater”)
else: # n2 is greater or equal to
if num2 > num3:
print(f”{num2} is greater”)
else:
print(f”{num3} is greater”)
##

#enter 3 sides of a triangle and check if they are:
#equilateral, isoceles, scalene, right angled triangle
side1,side2,side3 = 90,60,30
if side1==side2:
if side1 == side3:
print(“Equilateral”)
else:
print(“Isoceles”)
else:
if side1==side3:
print(“Isoceles”)
else:
if side2==side3:
print(“Isoceles”)
else:
print(“Scalene”)

#modify the above code to handle Right Angled triangle logic
# loops –
# FOR : know how many times you need to repeat
# WHILE : dont know how many times but you have the condition
# range(start, stop,step): starts with start, goes upto stop (not including)
# step: each time value is increasesd by step
# range(10,34,6): 10, 16, 22, 28
# range(start, stop) : default step is 1
# range(10,17): 10,11,12,13,14,15,16
# range(stop): default start is zero, default step is 1
# range(5): 0,1,2,3,4

# generate values from 1 to 10
for counter in range(1,11): # 1,2,3…10
print(counter,end=“, “)
print()
print(“Thank You”)

# generate first 10 odd numbers
for odd_num in range(1,11,2): # 1,2,3…10
print(odd_num,end=“, “)
print()
print(“———-“)
for counter in range(10):
print(2*counter+1,end=“, “)
print()
print(“———-“)
# generate even numbers till 50
for even_num in range(0,50,2): # 1,2,3…10
print(even_num,end=“, “)
print()
##############
# WHILE: is always followed by a condition and only if the condition is true, u get in
# WAP to print hello till user says so
user = “y”
while user==“y”:
print(“Hello”)
user = input(“Enter y to continue or anyother key to stop: “)
##

print(“method 2”)

while True:
user = input(“Enter y to continue or anyother key to stop: “)
if user!=“y”:
break
print(“Hello”)

print(“Thank you”)
count = int(input(“How many times you want to print: “))
while count >0:
print(“Hello”)
count-=1 #count = count-1
# For loops
”’
* * * * *
* * * * *
* * * * *
* * * * *
* * * * *
”’
n=5
for j in range(n):
for i in range(n):
print(“*”,end=” “)
print()

”’
*
* *
* * *
* * * *
* * * * *
”’
n=5
num_stars=1
for j in range(n):
for i in range(num_stars):
print(“*”,end=” “)
print()
num_stars+=1

#
n=5
for j in range(n):
for i in range(j+1):
print(“*”,end=” “)
print()

”’
* * * * *
* * * *
* * *
* *
*
”’
for j in range(n):
for i in range(n-j):
print(“*”,end=” “)
print()

”’
* * * * *
* * * *
* * *
* *
*
”’
for j in range(n):
for k in range(j):
print(“”, end=” “)
for i in range(n-j):
print(“*”,end=” “)
print()

””
Practice Program:
*
* *
* * *
* * * *
* * * * *
”’
”’
Multiplication table:
1 * 1 = 1 2 * 1 = 2 … 10 * 1 = 10
1 * 2 = 2 2 * 2 = 4

10 * 10 = 100
”’

for mul in range(1,11):
for tab in range(1,11):
print(f”{tab:<2}* {mul:<2}= {tab*mul:<2},end=” “)
print()

”’
Print prime numbers between 5000 and 10,000

10 – prime or not
2
10%2==0 => not a prime
3
4
”’
for num in range(5000,10000):
isPrime = True
for i in range(2,num//2+1):
if num%i==0:
isPrime = False
break
if isPrime:
print(num,end=“, “)
”’
num = 11
isPrime =T
i in range(2,6)
isPrime =F
”’
# WAP to create a menu option to perform arithmetic operations
”’
Before you use while loop, decide:
1. Should the loop run atleast once (Exit Controlled), or
2. Should we check the condition even before running the loop (Entry controlled)
”’
# method 1: Exit controlled
while True:
num1 = int(input(“Enter first number: “))
num2 = int(input(“Enter second number: “))
print(“Your Option: “)
print(“1. Add”)
print(“2. Subtract”)
print(“3. Multiply”)
print(“4. Divide”)
print(“5. Exit”)
ch = input(“Enter your choice: “)
if ch==“1”:
print(“Addition = “,num1 + num2)
elif ch==“2”:
print(“Difference = “, num1 – num2)
elif ch==“3”:
print(“Multiplication = “, num1 * num2)
elif ch==“4”:
print(“Division = “,num1 / num2)
elif ch==“5”:
break
else:
print(“Invalid Option”)

#
# method 2: Exit controlled
ch = “1”
while ch!=“5”:
num1 = int(input(“Enter first number: “))
num2 = int(input(“Enter second number: “))
print(“Your Option: “)
print(“1. Add”)
print(“2. Subtract”)
print(“3. Multiply”)
print(“4. Divide”)
print(“5. Exit”)
ch = input(“Enter your choice: “)
if ch==“1”:
print(“Addition = “,num1 + num2)
elif ch==“2”:
print(“Difference = “, num1 – num2)
elif ch==“3”:
print(“Multiplication = “, num1 * num2)
elif ch==“4”:
print(“Division = “,num1 / num2)
elif ch==“5”:
print(“Exiting now…”)
else:
print(“Invalid Option”)
##
# method 3: Entry controlled
choice = input(“Enter Yes to perform arithmetic operations: “)
while choice.lower() ==“yes”:
num1 = int(input(“Enter first number: “))
num2 = int(input(“Enter second number: “))
print(“Your Option: “)
print(“1. Add”)
print(“2. Subtract”)
print(“3. Multiply”)
print(“4. Divide”)
print(“5. Exit”)
ch = input(“Enter your choice: “)
if ch==“1”:
print(“Addition = “,num1 + num2)
elif ch==“2”:
print(“Difference = “, num1 – num2)
elif ch==“3”:
print(“Multiplication = “, num1 * num2)
elif ch==“4”:
print(“Division = “,num1 / num2)
elif ch==“5”:
choice =“no”
print(“Exiting now…”)
else:
print(“Invalid Option”)

#
# Generate odd numbers from 1 till user wants to continue
num1 = 1
while True:
print(num1)
num1+=2
ch=input(“Enter y to generate next number or anyother key to stop: “)
if ch!=‘y’:
break
# Generate fibonacci numbers from 1 till user wants to continue
num1 = 0
num2 = 1
while True:
num3 =num1 +num2
print(num3)
num1,num2 = num2,num3
ch=input(“Enter y to generate next number or anyother key to stop: “)
if ch!=‘y’:
break
# Generate fibonacci numbers from 1 till user wants to continue

print(“Hit Enter key to continue or anyother key to stop! “)
num1 = 0
num2 = 1
while True:
num3 =num1 +num2
print(num3,end=“”)
num1,num2 = num2,num3
ch=input()
if ch!=:
break
import random
print(random.random())
print(random.randint(100,1000))

from random import randint
print(randint(100,1000))

# guess the number game – computer (has the number) v human (attempting)
from random import randint

num = randint(1,100)
attempt=0
while True:
guess = int(input(“Guess the number (1-100): “))
if guess<1 or guess>100:
print(“Invalid attempt!!!”)
continue

attempt+=1 #attempt=attempt+1
if guess ==num:
print(f”Congratulations! You got it right in {attempt} attempts.”)
break
elif guess < num:
print(“Sorry, that’s incorrect. Please try again with a higher number!”)
else:
print(“Sorry, that’s incorrect. Please try again with a lower number!”)

### ###
# guess the number game – computer (has the number) v computer (attempting)
from random import randint
start,stop = 1,100
num = randint(1,100)
attempt=0
while True:
#guess = int(input(“Guess the number (1-100): “))
guess = randint(start,stop)
if guess<1 or guess>100:
print(“Invalid attempt!!!”)
continue

attempt+=1 #attempt=attempt+1
if guess ==num:
print(f”Congratulations! You got it right in {attempt} attempts.”)
break
elif guess < num:
print(f”Sorry, {guess} that’s incorrect. Please try again with a higher number!”)
start=guess+1
else:
print(f”Sorry, {guess} that’s incorrect. Please try again with a lower number!”)
stop=guess-1

##
# guess the number game – computer (has the number) v computer (attempting)
from random import randint
total_attempts = 0
for i in range(10000):
start,stop = 1,100
num = randint(1,100)
attempt=0
while True:
#guess = int(input(“Guess the number (1-100): “))
guess = randint(start,stop)
if guess<1 or guess>100:
print(“Invalid attempt!!!”)
continue

attempt+=1 #attempt=attempt+1
if guess ==num:
print(f”Congratulations! You got it right in {attempt} attempts.”)
total_attempts+=attempt
break
elif guess < num:
print(f”Sorry, {guess} that’s incorrect. Please try again with a higher number!”)
start=guess+1
else:
print(f”Sorry, {guess} that’s incorrect. Please try again with a lower number!”)
stop=guess-1

print(“========================================”)
print(“Average number of attempts = “,total_attempts/10000)
print(“========================================”)
”’
Multi line
text of
comments
which can go into
multiple lines
”’
# Strings
str1 = ‘Hello’
str2 = “Hello there”
print(type(str1), type(str2))
str3 = ”’How are you?
Where are you from?
Where do you want to go?”’
str4 = “””I am fine
I live here
I am going there”””
print(type(str3), type(str4))
print(str3)
print(str4)
# one line of comment
”’
Multi line
text of
comments
which can go into
multiple lines
”’

# what’s your name?
print(‘what\’s your name?’)

# counting in Python starts from zero
str1 = ‘Hello there how are you?’
print(“Number of characters in str1 is”,len(str1))
print(“First character: “,str1[0], str1[-len(str1)])
print(“Second character: “,str1[1])
print(“Last character: “,str1[len(str1)-1])
print(“Last character: “,str1[-1])
print(“Second Last character: “,str1[-2])
print(“5th 6th 7th char: “,str1[4:7])
print(“First 4 char: “,str1[0:4],str1[:4])
print(“first 3 alternate char: “,str1[1:5:2])
print(“last 3 characters:”,str1[-3:])
print(“last 4 but one characters:”,str1[-4:-1])
print(str1[5:1:-1])

txt1 = “HiiH”
txt2=txt1[-1::-1] #reversing the text
print(txt2)
txt2=str1[-1:-7:-1] #reversing the text
print(txt2)
if txt2 == txt1:
print(“Its palindrome”)
else:
print(“Its not a palindrome”)
var1 = 5
#print(var1[0]) # ‘int’ object is not subscriptable

# add two strings
print(“Hello”+“, “+“How are you?”)
print(“Hello”,“How are you?”)
print((“Hello”+” “)*5)
print(“* “*5)
# for loop – using strings
str1 = “hello”
for i in str1:
print(i)

for i in range(len(str1)):
print(i, str1[i])

print(type(str1)) # <class ‘str’>
str2 = “HOW Are You?”
up_count, lo_count,sp_count = 0,0,0
for i in str2:
if i.islower():
lo_count+=1
if i.isupper():
up_count+=1
if i.isspace():
sp_count+=1

print(f”Number of spaces={sp_count}, uppercase letters={up_count} and lower case letters={lo_count})

#input values:
val1 = input(“Enter a number: “)
if val1.isdigit():
val1 = int(val1)
print(val1 * 5)
else:
print(“Invalid value”)

str3 = “123af ds”
print(str3.isalnum())

#
str1 =“How are You”
# docs.python.org
help(str.isascii)

help(help)


str1 = “HOw are YOU today?”
print(str1.upper())
print(str1.lower())
print(str1.title())
#str1 = str1.title()
# strings are immutable – you cant edit
#str1[3] = “A” #TypeError: ‘str’ object does not support item assignment
str1= str1[0:3]+“A”+str1[4:]
print(str1)
cnt = str1.lower().count(‘o’)
print(cnt)
cnt = str1.count(‘O’,3,15) # x,start,end
print(cnt)
# Strings – method
str1 = “Hello how are you doing today”
var1 = str1.split()
print(“Var 1 =”,var1)
var2 = str1.split(‘o’)
print(“Var 2 =”,var2)
str2 = “1,|Sachin,|Mumbai,|Cricket”
var3 = str2.split(‘,|’)
print(var3)
str11 = ” “.join(var1)
print(“Str11 = “,str11)
str11 = “”.join(var2)
print(“Str11 = “,str11)
str11= “–“.join(var3)
print(“Str11 = “,str11)
# Strings – method
str1 = “Hello how are you doing today”
str2 = str1.replace(‘o’,‘ooo’)
print(str2)
cnt = str1.count(‘z’)
print(“Number of z in the str1 =”,cnt)
find_cnt = str1.find(‘ow’)
if find_cnt==-1:
print(“Given substring is not in the main string”)
else:
print(“Substring in the str1 found at =”,find_cnt)

find_cnt = str1.find(‘o’,5,6)
print(“Substring in the str1 found at =”,find_cnt)

str2 = str1.replace(‘z’,‘ooo’,3)
print(str2)

################
## LIST = Linear Ordered Mutable Collection
l1 = [55, ‘Hello’,False,45.9,[2,4,6]]
print(“type of l1 = “,type(l1))
print(“Number of members in the list=”,len(l1))
print(l1[0],l1[4],l1[-1])
print(“type of l1 =”,type(l1[0]))
print(“type of l1 =”,type(l1[-1]))
l2 = l1[-1]
print(l2[0], l1[-1][0], type(l1[-1][0]))
l1[0] = 95
print(“L1 =”,l1)
## LIST = Linear Ordered Mutable Collection
l1 = [55, ‘Hello’,False,45.9,[2,4,6]]

for member in l1:
print(member)

print(l1+l1)
print(l1*2)

print(“count = “,l1.count(False))
print(“count = “,l1.count(‘Hello’))
# remove second last member – pop takes position
l1.pop(-2)
print(“L1 after Pop: “,l1)
l1.pop(-2)
print(“L1 after Pop: “,l1)
# delete the element – remove takes value
cnt = l1.count(‘Helloo’)
if cnt>0:
l1.remove(‘Helloo’)
print(“L1 after Remove: “,l1)
else:
print(“‘Helloo’ not in the list”)

# Collections – Lists – linear mutable ordered collection
l1 = [10,50,90,20,90]
# add and remove members
l1.append(25) #append will add at the end
l1.append(45)
print(“L1 after append: “,l1)
#insert takes position and the value to add
l1.insert(2,35)
l1.insert(2,65)
print(“L1 after insert: “,l1)
l1.remove(35) #takes value to delete
l1.remove(90)
print(“L1 after remove: “,l1)
cnt_90 = l1.count(90)
print(“Number of 90s: “,cnt_90)
l1.pop(2) #index at which you want to delete
print(“L1 after pop: “,l1)

# Collections – Lists – linear mutable ordered collection
l1 = [10,50,90,20,90]
l2 = l1.copy() #shallow – photocopy
l3 = l1 # deepcopy – same list with two names
print(“1. L1 = “,l1)
print(“1. L2 = “,l2)
print(“1. L3 = “,l3)
l1.append(11)
l2.append(22)
l3.append(33)
print(“2. L1 = “,l1)
print(“2. L2 = “,l2)
print(“2. L3 = “,l3)
print(“Index of 90:”,l1.index(90,3,7))

# Extend: l1 = l1+l2
l2=[1,2,3]
l1.extend(l2)
print(“L1 after extend:”,l1)
l1.reverse()
print(“L1 after reverse: “,l1)
l1.sort() #sort in ascending order
print(“L1 after sort: “,l1)
l1.sort(reverse=True) #sort in descending order
print(“L1 after reverse sort: “,l1)
l1.clear()
print(“L1 after clear: “,l1)

######## question from Vivek: ###########
l1 = [9,5,7,2]
target = 12
l2=l1.copy()
l2.sort() #[2,5,7,19]
for i in range(len(l2)-1):
if l2[i]+l2[i+1] == target
#l1.index(l2[i]), l1.index(l2[i+1])
break
else:
> target: stop
<target: check with i+1 with i+2
# Tuple – linear ordered immutable collection
l1 = [2,4,6,8]
print(l1, type(l1))
t1 = (2,4,6,8,2,4,6,2)
print(t1, type(t1))
l1[1] = 14
print(“L1 = “,l1)

#t1[1] = 14 TypeError: ‘tuple’ object does not support item assignment

print(“Index of 2 =”,t1.index(2))
print(“Count of 2 =”,t1.count(2))
print(t1, type(t1))
t1=list(t1)
t1[1] = 14
t1 = tuple(t1)
print(t1, type(t1))
for i in t1:
print(i)


t2 = (3,6,9) #packing
a,b,c = t2 #unpacking
print(a,type(a),b,type(b),c,type(c))

t3 = ()
print(“type of t3=”,type(t3))

t4 = (“Hello”,4)
print(“type of t4=”,type(t4))

# (“Hello” + “World”)*3 -> “Hello” + “World”*3
###############
# Dictionary: unordered mutable collection
# pairs of key:value
d1 = {0:3,1:6,2:9}
print(“type = “,type(d1))
print(d1[1])

basic_health= {“Name”:“Sachin”,
“Weight”:156,
“Age”:42,
23:“NY”}

print(basic_health[“Name”])

patients =[{“Name”:“Sachin”,“Weight”:156,“Age”:42,23:“NY”},
{“Name”:“Virat”,“Weight”:126,“Age”:38,23:“NY”},
{“Name”:“Rohit”,“Weight”:176,“Age”:24,23:“NY”},
{“Name”:“Kapil”,“Weight”:196,“Age”:62,23:“NY”}]

print(patients[1][“Weight”])
basic_health= {“Name”:“Sachin”,
“Weight”:2,
“Age”:2,
“Age”:10,
23:“NY”,
“Age”:15}

print(basic_health.keys())
print(basic_health.values())
print(basic_health.items())

# Dictionary
”’
WAP to input marks of three students in three subjects

marks = {‘Sachin’: [78, 87, 69], ‘Kapil’: [59, 79, 49], ‘Virat’: [88, 68, 78]}
”’
students = [‘Sachin’,‘Kapil’,‘Virat’]
subjects = [‘Maths’,‘Science’,‘English’]
marks = {}
#marks_list = []
num_students, num_subjects = 3,3

for i in range(num_students):
marks_list = []
for j in range(num_subjects):
m = int(input(“Enter the marks in subject ” + subjects[j]+” : “))
marks_list.append(m)
temp = {students[i]:marks_list}
marks.update(temp)
#marks_list.clear()

print(“Marks entered are: “,marks)

# Dictionary
”’
WAP to input marks of three students in three subjects.
calculate total and average of marks for all the 3 students
find who is the highest scorer in total and also for each subject

marks = {‘Sachin’: [78, 87, 69], ‘Kapil’: [59, 79, 49], ‘Virat’: [88, 68, 78]}
”’
students = [‘Sachin’, ‘Kapil’, ‘Virat’]
subjects = [‘Maths’, ‘Science’, ‘English’]
marks = {‘Sachin’: [78, 87, 69], ‘Kapil’: [59, 79, 49], ‘Virat’: [88, 68, 78]}
topper = {‘Total’: –1, ‘Name’: []}
subject_highest = [-1, –1, –1]

num_students, num_subjects = 3, 3
for i in range(num_students):
tot, avg = 0, 0
key = students[i]
for j in range(num_subjects):
tot = tot + marks[key][j]
# checking the highest values for each subject
# …

avg = tot / 3
print(f”Total marks obtained by {students[i]} is {tot} and average is {avg:.1f})
# check highest total
if tot >= topper[‘Total’]:
topper[‘Total’] = tot
topper[‘Name’].append(key)

print(f”{topper[‘Name’]} has topped the class with total marks of {topper[‘Total’]})
# Dictionary
”’
WAP to input marks of three students in three subjects.
calculate total and average of marks for all the 3 students
find who is the highest scorer in total and also for each subject

marks = {‘Sachin’: [78, 87, 69], ‘Kapil’: [59, 79, 49], ‘Virat’: [88, 68, 78]}
”’
students = [‘Sachin’,‘Kapil’,‘Virat’]
subjects = [‘Maths’,‘Science’,‘English’]
marks = {‘Sachin’: [78, 87, 69], ‘Kapil’: [59, 79, 49], ‘Virat’: [88, 68, 78]}
topper = {‘Total’:-1, ‘Name’:[]}
subject_highest = [-1,-1,-1]

num_students, num_subjects = 3,3
for i in range(num_students):
tot,avg = 0,0
key = students[i]
for j in range(num_subjects):
tot = tot + marks[key][j]
#checking the highest values for each subject
if marks[key][j] > subject_highest[j]:
subject_highest[j] = marks[key][j]

avg = tot / 3
print(f”Total marks obtained by {students[i]} is {tot} and average is {avg:.1f})
# check highest total
if tot >=topper[‘Total’]:
topper[‘Total’] = tot
topper[‘Name’].append(key)

print(f”{topper[‘Name’]} has topped the class with total marks of {topper[‘Total’]})
print(f”Highest marks for subjects {subjects} is {subject_highest})

marks = {‘Sachin’: [78, 87, 69], ‘Kapil’: [59, 79, 49], ‘Virat’: [88, 68, 78]}

# deep & shallow copy
marks2 = marks
marks3 = marks.copy()
print(“before update:”)
print(“Marks = “,marks)
print(“Marks2 = “,marks2)
print(“Marks3 = “,marks3)
marks2.pop(‘Kapil’)
marks.update({‘Mahi’:[71,91,81]})
print(“after update:”)
print(“Marks = “,marks)
print(“Marks2 = “,marks2)
print(“Marks3 = “,marks3)

###########################
## SETS
# SETS
l1 = [‘Apple’,‘Apple’,‘Apple’,‘Apple’,‘Apple’]
print(“Values in L1 = “,len(l1))

s1 = {‘Apple’,‘Apple’,‘Apple’,‘Apple’,‘Apple’}
print(type(s1))
print(“Values in S1 = “,len(s1))
# property 1: removes duplicate values

s1 = {‘Apple’,‘Banana’,‘Orange’,‘Grapes’,‘Mango’}
s2 = {‘Grapes’,‘Mango’, ‘Guava’,‘Pine apple’,‘Cherry’}
# property 2: order doesnt matter

print(“union – total how many values”)
print(s1|s2)
print(s1.union(s2))
print(“Intersection – common values between the sets”)
print(s1 & s2)
print(s1.intersection(s2))
print(“Difference (minus) – u remove set of values from another set”)
print(s1 – s2)
print(s1.difference(s2))
print(s2 – s1)
print(s2.difference(s1))

print(“Symmetric difference”)
print(s1 ^ s2)
print(s1.symmetric_difference(s2))
##
s1 = {1,2,3,4,5,6}
s2 = {4,5,6}
print(s1.isdisjoint(s2))
print(s1.issuperset(s2))

# sets, lists, tuples -> they are convertible in each others form
l1 = [‘Apple’,‘Apple’,‘Apple’,‘Apple’,‘Apple’]
l1 = list(set(l1))
print(l1)
s1 = {4,2,3}
print(s1)
# Functions

def smile():
txt=”’ A smile, a curve that sets all right,
Lighting days and brightening the night.
In its warmth, hearts find their flight,
A silent whisper of pure delight.”’
print(txt)

smile()

smile()

smile()

#==================

# function to calculate gross pay
def calc_grosspay():
basic_salary = 5000
hra = 0.1 * basic_salary
da = 0.4 * basic_salary
gross_pay = basic_salary + hra + da
print(“Your gross pay is”,gross_pay)

def calc_grosspay_withreturn():
basic_salary = 5000
hra = 0.1 * basic_salary
da = 0.4 * basic_salary
gross_pay = basic_salary + hra + da
return gross_pay

def calc_grosspay_return_input(basic_salary):
hra = 0.1 * basic_salary
da = 0.4 * basic_salary
gross_pay = basic_salary + hra + da
return gross_pay


bp_list = [3900,5000,6500,9000]
gp_1 = calc_grosspay_return_input(bp_list[3])
print(“Gross Pay for this month is”,gp_1)

gp = calc_grosspay_withreturn()
print(“Total gross pay for ABC is”,gp)
gp_list=[]
gp_list.append(gp)
calc_grosspay()
# Functions
HOUSENO = 55
def myfunc1():
#x = 51
global x
print(“1 Value of x =”,x)
print(“My House No =”, HOUSENO)
x = 51
print(“2 Value of x =”, x)


def myfunc2(a,b):
print(“======== MYFUNC2 ========”)
print(f”a = {a} and b = {b})
print(“Sum of a and b = “,a+b)

def myfunc3(a=5,b=3.14):
print(“======== MYFUNC3 ========”)
print(f”a = {a} and b = {b})
print(“Sum of a and b = “,a+b)

def myfunc4(a,b):
print(“======== MYFUNC4 ========”)
print(f”a = {a} and b = {b})
print(“Sum of a and b = “,a+b)


def myfunc5(a,*b):
print(“a = “,a)
print(“b = “, b)

def myfunc6(a,*b, **c):
print(“a = “,a)
print(“b = “, b)
print(“c = “, c)

# default arguments (there is a default value added)
myfunc3(10,20)
myfunc3(10)
myfunc3()
# required positional arguments
myfunc2(10,20)

x = 5
myfunc1()
print(“Value of x =”,x)
# non-positional = keyword arguments
myfunc4(b=22, a=33)

# variable length arguments
myfunc5(10,20,30,40)
myfunc5(10)
myfunc5(10, 20)
myfunc6(10, 20,“hello”,name=“Sachin”,runs=3000,city=“Mumbai”)
# function to check prime numbers
”’
10 = 2 to 5
7 = 2,
9 = 2,3
”’

def gen_prime(num):
”’
This function takes a parameter and checks if its a prime number or not
:param num: number (int)
:return: True/False (True for prime number)
”’
isPrime = True
for i in range(2,num//2):
if num%i ==0:
isPrime = False
break
return isPrime


if __name__ ==“__main__”:
num = 11
print(num,” : “,gen_prime(num))
num = 100
print(num,” : “,gen_prime(num))

# generate prime numbers between given range
start,end = 1000, 5000
for i in range(start,end):
check = gen_prime(i)
if check:
print(i,end=“, “)


# doc string: multi line comment added at the beginning of the function
help(gen_prime)
#import infy_apr as ia
from infy_apr import gen_prime

def product_val(n1,n2):
return n1 * n2

if __name__==“__main__”:
num1 = 1
num2 = 3
print(“Sum of two numbers is”,num2+num1)
# generate prime numbers between 50K to 50.5K
for i in range(50000,50500):
check = gen_prime(i)
if check:
print(i,end=“, “)
# recursive functions
”’
O -> L1 (20) -> L1(19) -> L1(18) … -> 1 -> 1
”’
def sayhi(n):
if n>0:
print(“Hello”)
sayhi(n-1)
else:
return 1

sayhi(20)

”’
Factorial of a number:
5! = 5 * 4 * 3 * 2 * 1
”’

def facto(n):
if n<=1:
return 1
else:
return n * facto(n-1)

result = facto(5)
print(“Factorial is”,result)

#############
def f1():
def f2():
print(“I am in f2 which is inside f1”)

print(“first line of f1”)
f2()
print(“second line of f1”)

def calculate(n1,n2,op):
def plus(n1,n2):
return n1 + n2
def diff(n1,n2):
return n1-n2
if op==“+”:
output = plus(n1,n2)
if op==“-“:
output = diff(n1, n2)
return output

res = calculate(5,10,“+”)
print(“1. Result = “,res)
res = calculate(5,10,“-“)
print(“2. Result = “,res)

####################

def plus(n1,n2):
return n1 + n2
def diff(n1,n2):
return n1-n2
def calculate(n1,n2,func):
output = func(n1,n2)
return output

res = calculate(5,10,plus)
print(“1. Result = “,res)
res = calculate(5,10,diff)
print(“2. Result = “,res)
###############

# in-built functions()
# user defined functions()
# anonymous / one line /lambda

def myfunc1(a,b):
return a**b
#above myfunc1() can also be written as:
myfunc2 = lambda a,b: a**b
print(“5 to power of 4 is”,myfunc2(5,4))

”’
map: apply same logic on all the values of the list: multiply all the values by 76
filter: filter out values in a list based on a condition: remove -ve values
reduce: reduce multiple values in a list to a single value
”’
# a= 11, b = 12, c = 13…
calc = 0
list1 = [‘a’,‘b’,‘c’,‘d’]
word = input()
for i in word:
calc = calc+list1.index(i) + 11 # 11 + 13+14
print(calc)

# recursive functions
”’
O -> L1 (20) -> L1(19) -> L1(18) … -> 1 -> 1
”’
def sayhi(n):
if n>0:
print(“Hello”)
sayhi(n-1)
else:
return 1

sayhi(20)

”’
Factorial of a number:
5! = 5 * 4 * 3 * 2 * 1
”’

def facto(n):
if n<=1:
return 1
else:
return n * facto(n-1)

result = facto(5)
print(“Factorial is”,result)

#############
def f1():
def f2():
print(“I am in f2 which is inside f1”)

print(“first line of f1”)
f2()
print(“second line of f1”)

def calculate(n1,n2,op):
def plus(n1,n2):
return n1 + n2
def diff(n1,n2):
return n1-n2
if op==“+”:
output = plus(n1,n2)
if op==“-“:
output = diff(n1, n2)
return output

res = calculate(5,10,“+”)
print(“1. Result = “,res)
res = calculate(5,10,“-“)
print(“2. Result = “,res)

####################

def plus(n1,n2):
return n1 + n2
def diff(n1,n2):
return n1-n2
def calculate(n1,n2,func):
output = func(n1,n2)
return output

res = calculate(5,10,plus)
print(“1. Result = “,res)
res = calculate(5,10,diff)
print(“2. Result = “,res)
###############

# in-built functions()
# user defined functions()
# anonymous / one line /lambda

def myfunc1(a,b):
return a**b
#above myfunc1() can also be written as:
myfunc2 = lambda a,b: a**b
print(“5 to power of 4 is”,myfunc2(5,4))

”’
map: apply same logic on all the values of the list: multiply all the values by 76
filter: filter out values in a list based on a condition: remove -ve values
reduce: reduce multiple values in a list to a single value
”’
# a= 11, b = 12, c = 13…
calc = 0
list1 = [‘a’,‘b’,‘c’,‘d’]
word = input()
for i in word:
calc = calc+list1.index(i) + 11 # 11 + 13+14
print(calc)

”’
map: apply same logic on all the values of the list: multiply all the values by 76
filter: filter out values in a list based on a condition: remove -ve values
reduce: reduce multiple values in a list to a single value
”’
value_usd = [12.15,34.20,13,8,9,12,45,87,56,78,54,34]
value_inr = []
# 1 usd = 78 inr
for v in value_usd:
value_inr.append(v*78)
print(“Value in INR: “,value_inr)

value_inr =list(map(lambda x: 78*x,value_usd))
print(“Value in INR: “,value_inr)

# filter: filter out the values
new_list=[12,7,0,-5,-6,15,18,21,-44,-90,-34,56,43,12,7,0,-5,-6,15,18,21,-44,-90,-34,56,43]
output_list = list(filter(lambda x: x>=0,new_list))
print(“Filtered: “,output_list)

output_list = list(filter(lambda x: x%3==0 and x>=0,new_list))
print(“Filtered: “,output_list)

# reduce
import functools as ft
#from functools import reduce
new_list=[12,7,0,-5,-6,15,18,21,-44,-90,-34,56,43]
val = ft.reduce(lambda x,y:x+y,new_list)
print(“Value after reduce = “,val)
”’
x+y => [12,7,0,-5,-6,15,18,21,-44,-90,-34,56,43]
12+7
19+0
19+ -5
14 + -6
8+15
”’
abc = lambda x,y:x+y
def abc(x,y):
return x+y

##################################

## class & objects
”’
car – class
number of wheels – 4, color, make

driving
parking
”’

class Book:
number_of_books = 0

def reading(self):
print(“I am reading a book”)

b1 = Book() #creating object of class Book
b2 = Book()
b3 = Book()
b4 = Book()
print(b1.number_of_books)
b1.reading()
”’
class level variables and methods
object level variables and methods
”’
”’
__init__() : will automatically called when object is created
”’
class Book:
book_count = 0 # class level variable

def __init__(self,title): # object level method
self.title=title # object level variable
total = 0 #normal variable
Book.book_count+=1
@classmethod
def output(cls):
print(“Total book now available = “, Book.book_count)

b1 = Book(“Python Programming”)
b2 = Book(“SQL Programming”)
b3 = Book(“”)
print(type(b1))

#############
print(“Printing book_count: “)
print(b1.book_count)
print(b2.book_count)
print(b3.book_count)
print(Book.book_count)
print(“Printing output:”)
b1.output()
b2.output()
b3.output()
Book.output()
print(“Printing Title”)
print(“B1 title: “, b1.title)
print(“B2 title: “, b2.title)
print(“B3 title: “, b3.title)
#print(Book.title) AttributeError: type object ‘Book’ has no attribute ‘title’

##############
class MyMathOp:

def __init__(self,a,b):
self.n1 = a
self.n2 = b

def add_numbers(self):
self.total = self.n1 + self.n2

def subtract_numbers(self):
self.subtract = self.n1 – self.n2

def check_prime(self):
# check if n1 is prime or not
self.checkPrime = True
for i in range(2, self.n1//2+1):
if self.n1 % i==0:
self.checkPrime=False

m1 = MyMathOp(15,10)
print(m1.n1)
m1.check_prime()
print(m1.checkPrime)
”’
Encapsulation: information hiding – creating class
Abstraction: implementation hiding
Inheritance: inheritance properties from another class
Polymorphism: having multiple forms

”’

class Shape:
def __init__(self,s1=0,s2=0,s3=0,s4=0):
self.s1 = s1
self.s2 = s2
self.s3 = s3
self.s4 = s4
self.area = –1
self.surfacearea = –1

def print_val(self):
if self.s1>0:
print(“Side 1 = “,self.s1)
if self.s2>0:
print(“Side 2 = “,self.s2)
if self.s3>0:
print(“Side 3 = “,self.s3)
if self.s4>0:
print(“My Side 4 = “,self.s4)

def myarea(self):
print(“Area is not implemented!”)

”’
def mysurfacearea(self):
print(“Suraface area is not implemented!”)
”’
class Rectangle(Shape):
def __init__(self,s1,s2):
Shape.__init__(self,s1,s2)

def myarea(self):
print(“Area is”,self.s1*self.s2)

class Circle(Shape):
def __init__(self,s1):
Shape.__init__(self,s1)

def myarea(self):
print(“Area is”,3.14*self.s1*self.s2)


r1 = Rectangle(34,45)
r1.print_val()
r1.myarea()
c1 = Circle(12)
c1.print_val()
c1.myarea()
”’
Encapsulation: information hiding – creating class
Abstraction: implementation hiding
Inheritance: inheritance properties from another class
Polymorphism: having multiple forms

”’

class Shape:
def __init__(self,s1=0,s2=0,s3=0,s4=0):
self.s1 = s1
self.s2 = s2
self.s3 = s3
self.s4 = s4
self.area = –1
self.surfacearea = –1

def print_val(self):
if self.s1>0:
print(“Side 1 = “,self.s1)
if self.s2>0:
print(“Side 2 = “,self.s2)
if self.s3>0:
print(“Side 3 = “,self.s3)
if self.s4>0:
print(“My Side 4 = “,self.s4)

def myarea(self):
print(“Area is not implemented!”)

def myarea(self,s1):
pass

def myarea(self,s1,s2):
pass

def mysurfacearea(self):
print(“Suraface area is not implemented!”)

def dummy1(self): #public member
print(“Shape.Dummy1”)

def _dummy2(self): # protected
print(“Shape.Dummy2”)
def __dummy3(self): # protected
print(“Shape.Dummy2”)

def dummy4(self):
Shape.__dummy3(self)
class Rectangle(Shape):
def __init__(self,s1,s2):
Shape.__init__(self,s1,s2)

def myarea(self):
print(“Area is”,self.s1*self.s2)

class Circle(Shape):
def __init__(self,s1):
Shape.__init__(self,s1)

def myarea(self):
print(“Area is”,3.14*self.s1*self.s2)

class Cuboid(Rectangle):
def something(self):
print(“In Cuboid”)

class AnotherShape:
def test1(self):
print(“AnotherShape.test1”)
Shape.dummy1(self)
Shape._dummy2(self)
#Shape.__dummy3(self)


r1 = Rectangle(34,45)
r1.print_val()
r1.myarea()
c1 = Circle(12)
c1.print_val()
c1.myarea()

s1 = Shape()
#s1.myarea()
#s1.myarea(10)
#s1.area(10,20)

as1 = AnotherShape()
as1.test1()
”’
public: anyone can call public members of a class
protected (_var): (concept exists but practically it doesnt exist) – behaves like public
concept: only the derived class call
private (__var): available only within the given class
”’
#s1.__dummy3()
#r1.__dummy3()
s1.dummy4()
# Exception Handling – Errors
# syntax error
print(“hello”)

# logical error

# runtime errors – exceptions
a = 50
try:
b = int(input(“Enter the denominator: “))
except ValueError:
print(“You have provided invalid value for B, changing the value to 1”)
b = 1

try:
print(a/b) # ZeroDivisionError
print(“A by B is”,a/b)
except ZeroDivisionError:
print(“Sorry, we cant perform the analysis as denominator is zero”)

print(“thank you”)

################
a = 50
b = input(“Enter the denominator: “)
try:
print(“A by B is”, a / int(b)) # ZeroDivisionError & ValueError

except ValueError:
print(“You have provided invalid value for B, changing the value to 1”)
b = 1

except ZeroDivisionError:
print(“Sorry, we cant perform the analysis as denominator is zero”)

except Exception:
print(“An error has occurred, hence skipping this section”)

else:
print(“So we got the answer now!”)

finally:
print(“Not sure if there was an error but we made it through”)
print(“thank you”)

# File handling
”’
Working with Text files:
1. read: read(), readline(), readlines()
2. write: write(), writelines()
3. append

Modes: r,r+, w, w+, a, a+

Accessing the file:
1. Absolute path:
2. Relative path:
”’
path=“C:/Folder1/Folder2/txt1.txt”
path=“C:\\Folder1\\Folder2\\txt1.txt”
path=“ptxt1.txt”

content=”’Twinkle twinkle little star
How I wonder what you are
Up above the world so high
like a diamond in the sky
”’

file_obj = open(path,“a+”)

file_obj.write(content)

file_obj.seek(0) # go to the beginning of the content

read_cnt = file_obj.read()
file_obj.close()

print(read_cnt)

#################

path=“ptxt1.txt”

file_obj = open(path,“r”)

read_cnt = file_obj.read()

print(“============”)
print(read_cnt)
file_obj.seek(0)
read_cnt = file_obj.read(10)
print(“============”)
print(read_cnt)

read_cnt = file_obj.readline()
print(“============”)
print(read_cnt)
read_cnt = file_obj.readline(10000)
print(“============”)
print(read_cnt)
file_obj.close()

################

path=“ptxt1.txt”

file_obj = open(path,“r+”)

file_cnt = file_obj.readlines()

print(file_cnt)
file_obj.close()

file_obj = open(path,“w”)
write_nt = [‘Hello how are you?\n,‘I am fine\n,‘Where are you going\n,‘sipdfjisdjisdjf\n]
file_obj.writelines(write_nt)
file_obj.close()

How to add a Machine Learning Project to GitHub

Maintaining a GitHub data science portfolio is very essential for data science professionals and students in their career. This will essentially showcase their skills and projects.

Steps to add an existing Machine Learning Project in GitHub

Step 1: Install GIT on your system

We will use the git command-line interface which can be downloaded from:

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

Step 2: Create GitHub account here:

https://github.com/

Step 3: Now we create a repository for our project. It’s always a good practice to initialize the project with a README file.

Step 4: Go to the Git folder located in Program Files\Git and open the git-bash terminal.

Step 5: Now navigate to the Machine Learning project folder using the following command.

cd PATH_TO_ML_PROJECT

Step 6: Type the following git initialization command to initialize the folder as a local git repository.

git init

We should get a message “Initialized empty Git repository in your path” and .git folder will be created which is hidden by default.

Step 7: Add files to the staging area for committing using this command which adds all the files and folders in your ML project folder.

git add .

Note: git add filename.extension can also be used to add individual files.

Step 8: We will now commit the file from the staging area and add a message to our commit. It is always a good practice to having meaningful commit messages which will help us understand the commits during future visits and revision. Type the following command for your first commit.

git commit -m "Initial project commit"

Step 9: This only adds our files to the local branch of our system and we have to link with our remote repository in GitHub. To link them go to the GitHub repository we have created earlier and copy the remote link under “..or push an existing repository from the command line”.

First, get the url of the github project:

Now, In the git-bash window, paste the command below followed by your remote repository’s URL.

git remote add origin YOUR_REMOTE_REPOSITORY_URL

Step 10: Finally, we have to push the local repository to the remote repository in GitHub

git push -u origin master

Sign into your github account

Authorize GitCredentialManager

After this, the Machine Learning project will be added to your GitHub with the files.

We have successfully added an existing Machine Learning Project to GitHub. Now is the time to create your GitHub portfolio by adding more projects to it.

Restricted Boltzmann Machine and Its Application

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by “stacking” RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation. This deep learning algorithm became very popular after the Netflix Competition where RBM was used as a collaborative filtering technique to predict user ratings for movies and beat most of its competition. It is useful for regression, classification, dimensionality reduction, feature learning, topic modelling and collaborative filtering.

Restricted Boltzmann Machines are stochastic two layered neural networks which belong to a category of energy based models that can detect inherent patterns automatically in the data by reconstructing input. They have two layers visible and hidden. Visible layer has input nodes (nodes which receive input data) and the hidden layer is formed by nodes which extract feature information from the data and the output at the hidden layer is a weighted sum of input layers. They don’t have any output nodes and they don’t have typical binary output through which patterns are learnt. The learning process happens without that capability which makes them different. We only take care of input nodes and don’t worry about hidden nodes. Once the input is provided, RBM’s automatically capture all the patterns, parameters and correlation among the data.

What is Boltzman Machine?

Let’s first undertand what’s Boltzman Machine. Boltzmann Machine was first invented in 1985 by Geoffrey Hinton, a professor at the University of Toronto. He is a leading figure in the deep learning community and is referred to by some as the “Godfather of Deep Learning”.

Boltzman Machine
  • Boltzmann Machine is a generative unsupervised model, which involves learning a probability distribution from an original dataset and using it to make inferences about never before seen data.
  • Boltzmann Machine has an input layer (also referred to as the visible layer) and one or several hidden layers (also referred to as the hidden layer).
  • Boltzmann Machine uses neural networks with neurons that are connected not only to other neurons in other layers but also to neurons within the same layer.
  • Everything is connected to everything. Connections are bidirectional, visible neurons connected to each other and hidden neurons also connected to each other
  • Boltzmann Machine doesn’t expect input data, it generates data. Neurons generate information regardless they are hidden or visible.
  • For Boltzmann Machine all neurons are the same, it doesn’t discriminate between hidden and visible neurons. For Boltzmann Machine whole things are system and its generating state of the system.

In Boltzmann Machine, we use our training data and feed into the Boltzmann Machine as input to help the system adjust its weights. It resembles our system not any such system in the world. It learns from the input, what are the possible connections between all these parameters, how do they influence each other and therefore it becomes a machine that represents our system. Boltzmann Machine consists of a neural network with an input layer and one or several hidden layers. The neurons in the neural network make stochastic decisions about whether to turn on or off based on the data we feed during training and the cost function the Boltzmann Machine is trying to minimize. By doing so, the Boltzmann Machine discovers interesting features about the data, which help model the complex underlying relationships and patterns present in the data.

This Boltzmann Machine uses neural networks with neurons that are connected not only to other neurons in other layers but also to neurons within the same layer. That makes training an unrestricted Boltzmann machine very inefficient and Boltzmann Machine had very little commercial success.
Boltzmann Machines are primarily divided into two categories: Energy-based Models (EBMs) and Restricted Boltzmann Machines (RBM). When these RBMs are stacked on top of each other, they are known as Deep Belief Networks (DBN). Our focus of discussion here is the RBM.

Restricted Boltzmann Machines (RBM)

Restricted Boltzman Machine Algorithm
  • What makes RBMs different from Boltzmann machines is that visible node isn’t connected to each other, and hidden nodes aren’t connected with each other. Other than that, RBMs are exactly the same as Boltzmann machines.
  • It is a probabilistic, unsupervised, generative deep machine learning algorithm.
  • RBM’s objective is to find the joint probability distribution that maximizes the log-likelihood function.
  • RBM is undirected and has only two layers, Input layer, and hidden layer
  • All visible nodes are connected to all the hidden nodes. RBM has two layers, visible layer or input layer and hidden layer so it is also called an asymmetrical bipartite graph.
  • No intralayer connection exists between the visible nodes. There is also no intralayer connection between the hidden nodes. There are connections only between input and hidden nodes.
  • The original Boltzmann machine had connections between all the nodes. Since RBM restricts the intralayer connection, it is called a Restricted Boltzmann Machine.
  • Since RBMs are undirected, they don’t adjust their weights through gradient descent and backpropagation. They adjust their weights through a process called contrastive divergence. At the start of this process, weights for the visible nodes are randomly generated and used to generate the hidden nodes. These hidden nodes then use the same weights to reconstruct visible nodes. The weights used to reconstruct the visible nodes are the same throughout. However, the generated nodes are not the same because they aren’t connected to each other.

Simple Understanding of RBM

Problem Statement: Let’s take an example of a small café just across a street where people come in the evening to hang out. We see that normally three people: Geeta, Meeta and Paavit visit frequently. Not always all of them show up together. We have all the possible combinations of these three people showing up. It could be just Geeta, Meeta or Paavit show up or Geeta and Meeta come at the same time or Paavit and Meeta or Paavit and Geeta or all three of them show up or none of them show up on some days. All the possibilities are valid.

Left to Right: Geeta, Meeta, Paavit

Let’s say, you watch them coming everyday and make a note of it. Let’s take first day, Meeta and Geeta comes and Paavit didn’t. Second day, Paavit comes but Geeta and Meeta doesn’t. After noticing for 15 days, you find that only these two possibilities are repeated. As represented in the table.

Visits of Geeta, Meeta and Paavit to the cafe

That’s an interesting finding and more so when we come to know that these three people are totally unknown to each other. You also find out that there are two café managers: Ratish and Satish. Lets tabulate it again with 5 people now (3 visitors and 2 managers).

Visits of customer and presence of manager on duty

We find that, Geeta and Meeta likes Ratish so they show up when Ratish is on duty. Paavit likes Satish so he shows up only when Satish is on duty. So, we look at the data we might say that Geeta and Meeta went to the café on the days Ratish is on duty and Paavit went when Satish is on duty. Lets add some weights.

Customers and Managers relation with weights

Since we see that customers in our dataset, we call them as visible layer. Managers are not shown in the dataset, we call it as hidden layer. This is an example of Restricted Boltzmann Machine (RBM).

RBM

(… to be continued…)

Working of RBM

RBM is a Stochastic Neural Network which means that each neuron will have some random behavior when activated. There are two other layers of bias units (hidden bias and visible bias) in an RBM. This is what makes RBMs different from autoencoders. The hidden bias RBM produces the activation on the forward pass and the visible bias helps RBM to reconstruct the input during a backward pass. The reconstructed input is always different from the actual input as there are no connections among the visible units and therefore, no way of transferring information among themselves.

Step 1

The above image shows the first step in training an RBM with multiple inputs. The inputs are multiplied by the weights and then added to the bias. The result is then passed through a sigmoid activation function and the output determines if the hidden state gets activated or not. Weights will be a matrix with the number of input nodes as the number of rows and the number of hidden nodes as the number of columns. The first hidden node will receive the vector multiplication of the inputs multiplied by the first column of weights before the corresponding bias term is added to it.

Here is the formula of the Sigmoid function shown in the picture:

So the equation that we get in this step would be,

where h(1) and v(0) are the corresponding vectors (column matrices) for the hidden and the visible layers with the superscript as the iteration v(0) means the input that we provide to the network) and a is the hidden layer bias vector.

(Note that we are dealing with vectors and matrices here and not one-dimensional values.)

Now this image shows the reverse phase or the reconstruction phase. It is similar to the first pass but in the opposite direction. The equation comes out to be:

where v(1) and h(1) are the corresponding vectors (column matrices) for the visible and the hidden layers with the superscript as the iteration and b is the visible layer bias vector.

Now, the difference v(0)−v(1) can be considered as the reconstruction error that we need to reduce in subsequent steps of the training process. So the weights are adjusted in each iteration so as to minimize this error and this is what the learning process essentially is.

In the forward pass, we are calculating the probability of output h(1) given the input v(0) and the weights W denoted by:

And in the backward pass, while reconstructing the input, we are calculating the probability of output v(1) given the input h(1) and the weights W denoted by:

The weights used in both the forward and the backward pass are the same. Together, these two conditional probabilities lead us to the joint distribution of inputs and the activations:

Reconstruction is different from regression or classification in that it estimates the probability distribution of the original input instead of associating a continuous/discrete value to an input example. This means it is trying to guess multiple values at the same time. This is known as generative learning as opposed to discriminative learning that happens in a classification problem (mapping input to labels).

Let us try to see how the algorithm reduces loss or simply put, how it reduces the error at each step. Assume that we have two normal distributions, one from the input data (denoted by p(x)) and one from the reconstructed input approximation (denoted by q(x)). The difference between these two distributions is our error in the graphical sense and our goal is to minimize it, i.e., bring the graphs as close as possible. This idea is represented by a term called the Kullback–Leibler divergence.

KL-divergence measures the non-overlapping areas under the two graphs and the RBM’s optimization algorithm tries to minimize this difference by changing the weights so that the reconstruction closely resembles the input. The graphs on the right-hand side show the integration of the difference in the areas of the curves on the left.

This gives us intuition about our error term. Now, to see how actually this is done for RBMs, we will have to dive into how the loss is being computed. All common training algorithms for RBMs approximate the log-likelihood gradient given some data and perform gradient ascent on these approximations.

Contrastive Divergence

Here is the pseudo-code for the CD algorithm:

CD Algorithm pseudo code

Applications:
* Pattern recognition : RBM is used for feature extraction in pattern recognition problems where the challenge is to understand the hand written text or a random pattern.
* Recommendation Engines : RBM is widely used for collaborating filtering techniques where it is used to predict what should be recommended to the end user so that the user enjoys using a particular application or platform. For example : Movie Recommendation, Book Recommendation
* Radar Target Recognition : Here, RBM is used to detect intra pulse in Radar systems which have very low SNR and high noise.

Source: wikipedia (https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine)

Cloud Computing Basics – 1

Cloud computing is a computing term or metaphor that evolved in the late 2000s, based on utility and consumption of computer resources. Cloud computing is about moving computing from the single desktop PC/Data centers to Internet.

Figure 1.1: Cloud computing terms

Cloud: The “Cloud” is the default symbol of the internet in diagrams

Computing: The broader term of “Computing” encompasses- computation, coordination logic and storage.

Chapter 1.1: Fundamentals of cloud computing

Let’s take an example, you wish to play Ninja Fighters game with your friend on your smartphone. You go to the app store, download the app, log in, find your friend and within five minutes, you’re having fun. This ability to request services for yourself when you need them in cloud computing terms is known as on-demand self-service. You didn’t need to go to a physical store, you didn’t need to call someone to place an order and you didn’t need to sit on hold or wait for anyone else to do anything for you. Another example is of Gmail. You don’t need to install any software nor do you need hard disk space to save your emails -It’s all in the “cloud” managed by Google. In cloud computing, you don’t care what kind of software it is, all you care about is that the service offered is available and reliable. As more users join the game, the cloud is able to quickly grow or shrink to meet the change in demand—elasticity in techie terms. This is possible because a cloud provider, like IBM, has a massive number of servers pooled together that can be balanced between its various customers. But ultimately, you don’t care as long as it’s available for you.

1.1.1Features of Cloud Computing

Figure 2: NIST Visual Model of Cloud Computing Definition

Figure 2: NIST Visual Model of Cloud Computing Definition

The generally accepted definition of Cloud Computing comes from the National Institute of Standards and Technology (NIST). The NIST definition runs to several hundred words but essentially says that:

Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

What this simply means is the ability for end users to utilize parts of bulk resources and that these resources can be acquired quickly and easily. NIST also offers up several characteristics that it sees as essential for a service to be considered “Cloud”. These characteristics include;

  • On-demand self-service: The ability for an end user to sign up and receive services without the long delays that have characterized traditional IT
  • Broad network access: Ability to access the service via standard platforms (desktop, laptop, mobile etc)
  • Resource pooling: Resources are pooled across multiple customers
  • Rapid elasticity: Capability can scale to cope with demand peaks
  • Measured Service: Billing is metered and delivered as a utility service

1.1.2Types of Cloud

With cloud computing technology, large pools of resources can be connected through private or public networks.

Figure 3: Types of Cloud (Deployment Model)

What are the differences between these types of cloud computing, and how can you determine the right cloud path for your organization? Here are some fundamentals of each to help with the decision-making process().

Public

Public clouds are made available to the general public by a service provider who hosts the cloud infrastructure. Generally, public cloud providers like Amazon AWS, Microsoft and Google own and operate the infrastructure and offer access over the Internet. With this model, customers have no visibility or control over where the infrastructure is located. It is important to note that all customers on public clouds share the same infrastructure pool with limited configuration, security protections and availability variances.

Public Cloud customers benefit from economies of scale, because infrastructure costs are spread across all users, allowing each individual client to operate on a low-cost, “pay-as-you-go” model. Another advantage of public cloud infrastructures is that they are typically larger in scale than an in-house enterprise cloud, which provides clients with seamless, on-demand scalability. These clouds offer the greatest level of efficiency in shared resources; however, they are also more vulnerable than private clouds.

A public cloud is the obvious choice when:

  • Your standardized workload for applications is used by lot of people, such as e-mail.
  • You need to test and develop an application code.
  • You need incremental capacity i.e. the ability to add resources for peak times.
  • You’re doing collaboration projects.

Private

Private cloud is cloud infrastructure dedicated to a particular organization. Private clouds allow businesses to host applications in the cloud, while addressing concerns regarding data security and control, which is often lacking in a public cloud environment.  It is not shared with other organizations, whether managed internally or by a third-party, and it can be hosted internally or externally.

There are two variations of private clouds:

On-Premise Private CloudExternally Hosted
This type of cloud is hosted within an organization’s own facility. A businesses IT department would incur the capital and operational costs for the physical resources with this model. On-Premise Private Clouds are best used for applications that require complete control and configurability of the infrastructure and security.Externally hosted private clouds are also exclusively used by one organization, but are hosted by a third party specializing in cloud infrastructure. The service provider facilitates an exclusive cloud environment with full guarantee of privacy. This format is recommended for organizations that prefer not to use a public cloud infrastructure due to the risks associated with the sharing of physical resources.

Undertaking a private cloud project requires a significant level and degree of engagement to virtualize the business environment, and it will require the organization to reevaluate decisions about existing resources. Private clouds are more expensive but also more secure when compared to public clouds. An Info-Tech survey shows that 76% of IT decision-makers will focus exclusively on the private cloud, as these clouds offer the greatest level of security and control.

When to opt for a Private Cloud?

  • You need data sovereignty but want cloud efficiencies
  • You want consistency across services
  • You have more server capacity than your organization can use
  • Your data center must become more efficient
  • You want to provide private cloud services

Hybrid

Hybrid Clouds are a composition of two or more clouds (private, community or public) that remain unique entities but are bound together offering the advantages of multiple deployment models. In a hybrid cloud, you can leverage third party cloud providers in either a full or partial manner; increasing the flexibility of computing. Augmenting a traditional private cloud with the resources of a public cloud can be used to manage any unexpected surges in workload.

Hybrid cloud architecture requires both on-premise resources and off-site server based cloud infrastructure. By spreading things out over a hybrid cloud, you keep each aspect of your business in the most efficient environment possible. The downside is that you have to keep track of multiple cloud security platforms and ensure that all aspects of your business can communicate with each other.

Here are a couple of situations where a hybrid environment is best:

  • Your company wants to use a SaaS application but is concerned about security.
  • Your company offers services that are tailored for different vertical markets. You can use a public cloud to interact with the clients but keep their data secured within a private cloud.
  • You can provide public cloud to your customers while using a private cloud for internal IT.

Community

A community cloud is a is a multi-tenant cloud service model that is shared among several or organizations and that is governed, managed and secured commonly by all the participating organizations or a third party managed service provider. Community clouds are a hybrid form of private clouds built and operated specifically for a targeted group. These communities have similar cloud requirements and their ultimate goal is to work together to achieve their business objectives.

The goal of community clouds is to have participating organizations realize the benefits of a public cloud with the added level of privacy, security, and policy compliance usually associated with a private cloud. Community clouds can be either on-premise or off-premise.

Here are a couple of situations where a community cloud environment is best:

  • Government organizations within a state that need to share resources
  • A private HIPAA compliant cloud for a group of hospitals or clinics
  • Telco community cloud for telco DR to meet specific FCC regulations

Cloud computing is about shared IT infrastructure or the outsourcing of a company’s technology.  It is essential to examine your current IT infrastructure, usage and needs to determine which type of cloud computing can help you best achieve your goals.  Simply, the cloud is not one concrete term, but rather a metaphor for a global network and how to best utilize its advantages depends on your individual cloud focus.

1.1.3Advantages & Disadvantages

Advantagesof Cloud Computing

Cloud computing presents a huge opportunity for businesses.Let’s look at some of them:

Cost Efficient

Cloud computing is probably the most cost efficient method to use, maintain and upgrade. Traditional desktop software costs companies a lot in terms of finance. Adding up the licensing fees for multiple users can prove to be very expensive for the establishment concerned. The cloud, is available at much cheaper rates and can significantly lower the company’s IT expenses. Besides, there are many one-time-payment, pay-as-you-go and other scalable options available, which makes it reasonable.

Almost Unlimited Storage

Storing information in the cloud gives you almost unlimited storage capacity. Hence, you no more need to worry about running out of storage space or increasing your current storage space availability. 

Backup and Recovery

Since all your data is stored in the cloud, backing it up and restoring the same is relatively much easier than storing the same on a physical device. Furthermore, most cloud service providers are usually competent enough to handle recovery of information. Hence, this makes the entire process of backup and recovery much simpler than other traditional methods of data storage.

Automatic Software Integration

In the cloud, software integration is usually something that occurs automatically. This means that you do not need to take additional efforts to customize and integrate your applications as per your preferences. This aspect usually takes care of itself. You can also handpick just those services and software applications that you think will best suit your particular enterprise.  

Easy Access to Information

Once you register yourself in the cloud, you can access the information from anywhere, where there is an Internet connection.

Quick Deployment

Cloud computing gives you the advantage of quick deployment. Once you opt for this method of functioning, your entire system can be fully functional in a matter of a few minutes, dependingupon the exact kind of technology that you need for your business.

Disadvantages of Cloud Computing

Cloud computing also has some challengessuch as:

Technical Issues

Though it is true that information and data on the cloud can be accessed anytime and from anywhere at all, there are times when this system can have some serious dysfunction. Technology is always prone to outages and other technical issues. Even the best cloud service providers run into this kind of trouble, in spite of keeping up high standards of maintenance. Besides, you will need a very good Internet connection to be logged onto the server at all times. You will invariably be stuck in case of network and connectivity problems. 

Security in the Cloud

The other major issue while in the cloud is that of security issues. Before adopting this technology, you should know that you will be surrendering all your company’s sensitive information to a third-party cloud service provider. This could potentially put your company to great risk. Hence, you need to make absolutely sure that you choose the most reliable service provider, who will keep your information totally secure.

Prone to Attack

Storing information in the cloud could make your company vulnerable to external hack attacks and threats. As you are well aware, nothing on the internet is completely secure and hence, there is always the lurking possibility of stealth of sensitive data.

What are the basics of python programming?
  1. What is Python?

 Python is a high level computer programming language and famous for its plainness. Late in the 1980s Rossum shaped python and unconfined in 1991. Python ropes several programming paradigms, as well as ritual object-oriented and purposeful programming. Python is very vast and regular collection which provides numerous correspondence, framework and many practical relevance like web development, data analysis AI (artificial intelligence) scientific computing and much more.

I. Introduction

  • Why learn Python?

 There are several reasons to learn Python:

  • effortlessness of Learning: Python’s straightforward and sparkling syntax makes it reachable for beginners.
  • resourcefulness: It’s applicable in diverse domains like web development, data analysis, machine learning, artificial intelligence, scientific computing, etc.
  • Large Community and Libraries: Python has a massive community that contributes to its ecosystem by creating libraries and frameworks, allowing developers to accomplish tasks more efficiently.
  • Career Opportunities: Python is widely used across industries, and proficiency in Python opens up job opportunities in software development, data science, machine learning, and more.
  • High Demand: Due to its versatility and ease of use, Python developers are in high demand in the job market.

C. Brief history and popularity

  • History: Python was conceived in the late 1980s by Guido van Rossum, and its implementation began in December 1989. It was officially released in 1991 as Python 0.9. Python 2.x and Python 3.x are the two major versions coexisting for some time, with Python 2.x being officially discontinued in 2020 in favor of Python 3.x.
  • Popularity: Python’s popularity has surged over the years due to its simplicity, readability, versatility, and an extensive community-driven ecosystem. It’s used by both beginners and experienced developers for various purposes, contributing to its widespread adoption across industries. Its popularity is evident in fields like web development (Django, Flask), data science (Pandas, NumPy), machine learning (TensorFlow, PyTorch), and more.

II. Setting Up Python

A. Installing Python:

  1. Download Python: Visit the official Python website at python.org,
  2. navigate to the Downloads section, and select the version of Python suitable for your operating system (Windows, macOS, or Linux).
  1. Install Python: Run the installer and follow the installation instructions. Make sure to check the box that says “Add Python to PATH” during installation on Windows. This makes it easier to run Python from the command line.

B. Using Integrated Development Environments (IDEs) or Text Editors:

  1. IDEs: Integrated Development Environments like PyCharm, VSCode with Python extensions, Jupyter Notebook, or Spyder provide an all-in-one solution with features like code highlighting, debugging tools, and project management. Install an IDE of your choice by downloading it from the respective website and follow the setup instructions.
  2. Text Editors: Text editors like Sublime Text, Atom, or Notepad++ are simpler compared to IDEs but still support Python development. You write code and execute it separately. After installing a text editor, create a new file and save it with a .py extension (e.g., hello.py) to write Python code.

C. Running the First Python Program (Hello, World!):

  1. Using IDEs:
    1. Open your IDE.
    1. Create a new Python file.
    1. Type the following code:

python

print(“Hello, World!”)

  • Save the file.
    • Run the code using the “Run” or “Execute” button in the IDE. You should see “Hello, World!” printed in the output console.
  • Using Text Editors:
    • Open your chosen text editor.
    • Create a new file and type:

print(“Hello, World!”)

  • Save the file with a .py extension (e.g., hello.py).
    • Open a command line or terminal.
    • Navigate to the directory where your Python file is saved using cd (change directory) command.
    • Type python hello.py (replace hello.py with your file name) and press Enter.
    • You should see “Hello, World!” printed in the terminal.

Congratulations! You’ve successfully installed Python, chosen an environment to write code (IDE or text editor), and executed your first Python program displaying “Hello, World!”

III. Basics of Python Programming

A. Syntax and Indentation:

  • Syntax: Python’s syntax is clear and readable. It uses indentation to define blocks of code instead of using curly braces {} or keywords like end in other languages. Proper indentation (usually four spaces) is crucial for Python to understand the code structure correctly.
  • Example:

if 5 > 2: print(“Five is greater than two”)

B. Variables and Data Types:

  1. Variables: In Python, variables are used to store data. They can be assigned different data types and values during the program’s execution.
  2. Data Types: Python has several data types:
    1. Integers (int): Whole numbers without decimals.
    1. Floats (float): Numbers with decimals.
    1. Strings (str): Ordered sequences of characters enclosed in single (‘ ‘) or double (” “) quotes.
    1. Booleans (bool): Represents True or False values.
  3. Example:

# Variable assignment my_integer = 5 my_float = 3.14 my_string = “Hello, World!” my_boolean = True

C. Operators:

  1. Arithmetic Operators: Used for basic mathematical operations such as addition, subtraction, multiplication, division, etc.

python

# Examples of arithmetic operators a = 10 b = 5 print(a + b) # Addition print(a – b) # Subtraction print(a * b) # Multiplication print(a / b) # Division print(a % b) # Modulus (remainder) print(a ** b) # Exponentiation

  • Comparison Operators: Used to compare values and return True or False.

python

# Examples of comparison operators x = 10 y = 5 print(x == y) # Equal to print(x != y) # Not equal to print(x > y) # Greater than print(x < y) # Less than print(x >= y) # Greater than or equal to print(x <= y) # Less than or equal to

  • Logical Operators: Used to combine conditional statements.

python

# Examples of logical operators p = True q = False print(p and q) # Logical AND print(p or q) # Logical OR print(not p) # Logical NOT

D. Control Structures:

  1. Conditionals (if, elif, else): Used to make decisions in the code based on certain conditions.

python

# Example of conditional statements age = 18 if age >= 18: print(“You are an adult”) elif age >= 13: print(“You are a teenager”) else: print(“You are a child”)

  • Loops (for, while): Used for iterating over a sequence (for loop) or executing a block of code while a condition is True (while loop).

python

# Example of loops # For loop for i in range(5): print(i) # While loop count = 0 while count < 5: print(count) count += 1

IV. Data Structures in Python

A. Lists:

  • Definition: Lists are ordered collections of items or elements in Python. They are mutable, meaning the elements within a list can be changed or modified after the list is created.
  • Syntax: Lists are created by enclosing elements within square brackets [], separated by commas.
  • Example:

python

# Creating a list my_list = [1, 2, 3, 4, 5]

B. Tuples:

  • Definition: Tuples are similar to lists but are immutable, meaning the elements cannot be changed once the tuple is created.
  • Syntax: Tuples are created by enclosing elements within parentheses (), separated by commas.
  • Example:

python

# Creating a tuple my_tuple = (1, 2, 3, 4, 5)

C. Dictionaries:

  • Definition: Dictionaries are unordered collections of key-value pairs. They are mutable and indexed by unique keys. Each key is associated with a value, similar to a real-life dictionary where words (keys) have definitions (values).
  • Syntax: Dictionaries are created by enclosing key-value pairs within curly braces {}, separated by commas and using a colon : to separate keys and values.
  • Example:

python

# Creating a dictionary my_dict = {‘name’: ‘Alice’, ‘age’: 25, ‘city’: ‘New York’}

D. Sets:

  • Definition: Sets are unordered collections of unique elements. They do not allow duplicate elements.
  • Syntax: Sets are created by enclosing elements within curly braces {}, separated by commas.
  • Example:

python

# Creating a set my_set = {1, 2, 3, 4, 5}

Key Points:

  • Lists and tuples are ordered collections, but lists are mutable while tuples are immutable.
  • Dictionaries use key-value pairs to store data, allowing quick retrieval of values using their associated keys.
  • Sets are unordered collections of unique elements; they are useful for mathematical set operations like union, intersection, etc., and do not allow duplicate elements.

These data structures provide flexibility in storing and manipulating data in Python, each with its own characteristics and best-use cases. Understanding how to use them effectively can greatly enhance your ability to work with data in Python programs.

V. Functions and Modules

A. Defining Functions:

  • Definition: Functions in Python are blocks of reusable code designed to perform a specific task. They improve code modularity and reusability.
  • Syntax: Functions are defined using the def keyword, followed by the function name and parentheses containing optional parameters. The block of code inside the function is indented.
  • Example:

python

# Defining a function def greet(): print(“Hello, welcome!”)

B. Passing Arguments and Returning Values:

  • Arguments: Functions can accept parameters (arguments) to perform their tasks dynamically.
    • Positional Arguments: Defined based on the order they are passed.
    • Keyword Arguments: Defined by specifying the parameter name when calling the function.
  • Return Values: Functions can return values using the return statement.
  • Example:

python

# Function with arguments and return value def add(a, b): return a + b result = add(3, 5) # Passing arguments print(“Result:”, result) # Output: Result: 8

C. Working with Modules and Libraries:

  • Modules: Python modules are files containing Python code, which can define functions, classes, and variables. They can be imported into other Python scripts to reuse the code.
  • Libraries: Libraries are collections of modules that provide pre-written functionalities to ease development tasks.
  • Importing Modules/Libraries: Use the import keyword to import modules and libraries in your Python script.
  • Example:

python

# Importing a module import math # Importing the math module # Using functions from the imported module print(math.sqrt(16)) # Output: 4.0 (square root function from math module)

  • Creating and Using Your Own Modules: You can create your own modules by writing Python code in a separate file and importing it into your script.

VI. File Handling in Python

A. Reading from and Writing to Files:

Reading from Files (open() and read()):

  • To read from a file, you can use the open() function in Python, which opens a file and returns a file object. The read() method is used to read the contents of the file.
  • Syntax for Reading:

python

# Reading from a file file = open(‘file.txt’, ‘r’) # Opens the file in read mode (‘r’) content = file.read() # Reads the entire file content print(content) file.close() # Close the file after reading

Writing to Files (open() and write()):

  • To write to a file, open it with the appropriate mode (‘w’ for write, ‘a’ for append). The write() method is used to write content to the file.
  • Syntax for Writing:

python

# Writing to a file file = open(‘file.txt’, ‘w’) # Opens the file in write mode (‘w’) file.write(‘Hello, World!\n’) # Writes content to the file file.close() # Close the file after writing

B. File Modes and Operations:

File Modes:

  • Read Mode (‘r’): Opens a file for reading. Raises an error if the file does not exist.
  • Write Mode (‘w’): Opens a file for writing. Creates a new file if it doesn’t exist or truncates the file if it exists.
  • Append Mode (‘a’): Opens a file for appending new content. Creates a new file if it doesn’t exist.
  • Read and Write Mode (‘r+’): Opens a file for both reading and writing.
  • Binary Mode (‘b’): Used in conjunction with other modes (e.g., ‘rb’, ‘wb’) to handle binary files.

File Operations:

  • read(): Reads the entire content of the file or a specified number of bytes.
  • readline(): Reads a single line from the file.
  • readlines(): Reads all the lines of a file and returns a list.
  • write(): Writes content to the file.
  • close(): Closes the file when finished with file operations.

Using with Statement (Context Manager):

  • The with statement in Python is used to automatically close the file when the block of code is exited. It’s a good practice to use it to ensure proper file handling.
  • Syntax:

python

with open(‘file.txt’, ‘r’) as file: content = file.read() print(content) # File is automatically closed outside the ‘with’ block

VII. Object-Oriented Programming (OOP) Basics

A. Classes and Objects:

Classes:

  • Classes are blueprints for creating objects in Python. They encapsulate data (attributes) and behaviors (methods) into a single unit.
  • Syntax for Class Declaration:

python

# Class declaration class MyClass: # Class constructor (initializer) def __init__(self, attribute1, attribute2): self.attribute1 = attribute1 self.attribute2 = attribute2 # Class method def my_method(self): return “This is a method in MyClass”

Objects:

  • Objects are instances of classes. They represent real-world entities and have attributes and behaviors defined by the class.
  • Creating Objects from a Class:

python

# Creating an object of MyClass obj = MyClass(“value1”, “value2”)

B. Inheritance and Polymorphism:

Inheritance:

  • Inheritance allows a class (subclass/child class) to inherit attributes and methods from another class (superclass/parent class).
  • Syntax for Inheritance:

python

# Parent class class Animal: def sound(self): return “Some sound” # Child class inheriting from Animal class Dog(Animal): def sound(self): # Overriding the method return “Woof!”

Polymorphism:

  • Polymorphism allows objects of different classes to be treated as objects of a common superclass. It enables the same method name to behave differently for each class.
  • Example of Polymorphism:

python

# Polymorphism example def animal_sound(animal): return animal.sound() # Same method name, different behaviors # Creating instances of classes animal1 = Animal() dog = Dog() # Calling the function with different objects print(animal_sound(animal1)) # Output: “Some sound” print(animal_sound(dog)) # Output: “Woof!”

VIII. Error Handling (Exceptions)

A. Understanding Exceptions:

What are Exceptions?

  • Exceptions are errors that occur during the execution of a program, disrupting the normal flow of the code.
  • Examples include dividing by zero, trying to access an undefined variable, or attempting to open a non-existent file.

Types of Exceptions:

  • Python has built-in exception types that represent different errors that can occur during program execution, like ZeroDivisionError, NameError, FileNotFoundError, etc.

B. Using Try-Except Blocks:

Handling Exceptions with Try-Except Blocks:

  • Try-except blocks in Python provide a way to handle exceptions gracefully, preventing the program from crashing when errors occur.
  • Syntax:

python

try: # Code that might raise an exception result = 10 / 0 # Example: Division by zero except ExceptionType as e: # Code to handle the exception print(“An exception occurred:”, e)

Handling Specific Exceptions:

  • You can catch specific exceptions by specifying the exception type after the except keyword.
  • Example:

python

try: file = open(‘nonexistent_file.txt’, ‘r’) except FileNotFoundError as e: print(“File not found:”, e)

Using Multiple Except Blocks:

  • You can use multiple except blocks to handle different types of exceptions separately.
  • Example:

python

try: result = 10 / 0 except ZeroDivisionError as e: print(“Division by zero error:”, e) except Exception as e: print(“An exception occurred:”, e)

Handling Exceptions with Else and Finally:

  • The else block runs if no exceptions are raised in the try block, while the finally block always runs, whether an exception is raised or not.
  • Example:

python

try: result = 10 / 2 except ZeroDivisionError as e: print(“Division by zero error:”, e) else: print(“No exceptions occurred!”) finally: print(“Finally block always executes”)

IX. Introduction to Python Libraries

A. Overview of Popular Libraries:

  1. NumPy:
    1. Description: NumPy is a fundamental package for scientific computing in Python. It provides support for arrays, matrices, and mathematical functions to operate on these data structures efficiently.
    1. Key Features:
      1. Multi-dimensional arrays and matrices.
      1. Mathematical functions for array manipulation.
      1. Linear algebra, Fourier transforms, and random number capabilities.
    1. Example:

python

import numpy as np # Creating a NumPy array arr = np.array([1, 2, 3, 4, 5])

  • Pandas:
    • Description: Pandas is a powerful library for data manipulation and analysis. It provides data structures like Series and DataFrame, making it easy to handle structured data.
    • Key Features:
      • Data manipulation tools for reading, writing, and analyzing data.
      • Data alignment, indexing, and handling missing data.
      • Time-series functionality.
    • Example:

python

import pandas as pd # Creating a DataFrame data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’], ‘Age’: [25, 30, 35]} df = pd.DataFrame(data)

  • Matplotlib:
    • Description: Matplotlib is a comprehensive library for creating static, interactive, and animated visualizations in Python. It provides functionalities to visualize data in various formats.
    • Key Features:
      • Plotting 2D and 3D graphs, histograms, scatter plots, etc.
      • Customizable visualizations.
      • Integration with Jupyter Notebook for interactive plotting.
    • Example:

python

import matplotlib.pyplot as plt # Plotting a simple line graph x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y) plt.xlabel(‘X-axis’) plt.ylabel(‘Y-axis’) plt.title(‘Simple Line Graph’) plt.show()

B. Installing and Importing Libraries:

Installing Libraries using pip:

  • Open a terminal or command prompt and use the following command to install libraries:

pip install numpy pandas matplotlib

Importing Libraries in Python:

  • Once installed, import the libraries in your Python script using import statements:

Python import numpy as np import pandas as pd import matplotlib.pyplot as plt

  • After importing, you can use the functionalities provided by these libraries in your Python code.

X. Real-life Examples and Projects

A. Simple Projects for Practice:

  1. To-Do List Application:
    1. Create a command-line to-do list application that allows users to add tasks, mark them as completed, delete tasks, and display the list.
  2. Temperature Converter:
    1. Build a program that converts temperatures between Celsius and Fahrenheit or other temperature scales.
  3. Web Scraper:
    1. Develop a web scraper that extracts information from a website and stores it in a structured format like a CSV file.
  4. Simple Calculator:
    1. Create a basic calculator that performs arithmetic operations such as addition, subtraction, multiplication, and division.
  5. Hangman Game:
    1. Implement a command-line version of the Hangman game where players guess letters to reveal a hidden word.
  6. Address Book:
    1. Develop an address book application that stores contacts with details like name, phone number, and email address.
  7. File Organizer:
    1. Write a script that organizes files in a directory based on their file extensions or other criteria.

B. Exploring Python’s Applications in Different Fields:

  1. Web Development (Django, Flask):
    1. Python is widely used for web development. Explore frameworks like Django or Flask to build web applications, REST APIs, or dynamic websites.
  2. Data Science and Machine Learning:
    1. Use libraries like NumPy, Pandas, Scikit-learn, or TensorFlow to perform data analysis, create machine learning models, or work on predictive analytics projects.
  3. Scientific Computing:
    1. Python is used extensively in scientific computing for simulations, modeling, and solving complex mathematical problems. Use libraries like SciPy or SymPy for scientific computations.
  4. Natural Language Processing (NLP):
    1. Explore NLP with Python using libraries like NLTK or spaCy for text processing, sentiment analysis, or language translation tasks.
  5. Game Development:
    1. Develop simple games using Python libraries like Pygame, allowing you to create 2D games and learn game development concepts.
  6. Automation and Scripting:
    1. Create scripts to automate repetitive tasks like file manipulation, data processing, or system administration using Python’s scripting capabilities.
  7. IoT (Internet of Things) and Raspberry Pi Projects:
    1. Experiment with Python for IoT projects by controlling sensors, actuators, or devices using Raspberry Pi and Python libraries like GPIO Zero.

XI. Conclusion

A. Recap of Key Points:

  1. Python Basics: Python is a high-level, versatile programming language known for its simplicity, readability, and vast ecosystem of libraries and frameworks.
  2. Core Concepts: Understanding Python’s syntax, data types, control structures, functions, and handling exceptions is crucial for effective programming.
  3. Popular Libraries: Libraries like NumPy, Pandas, Matplotlib, etc., offer specialized functionalities for data manipulation, scientific computing, visualization, and more.
  4. Project Ideas: Simple projects, such as to-do lists, calculators, web scrapers, etc., provide practical experience and reinforce learning.
  5. Real-world Applications: Python’s applications span diverse fields like web development, data science, machine learning, scientific computing, automation, IoT, and more.

B. Encouragement for Further Exploration:

  1. Continuous Learning: Python’s versatility and vast ecosystem offer endless opportunities for learning and growth.
  2. Practice and Projects: Build upon your knowledge by working on more complex projects, contributing to open-source, and experimenting with different libraries and domains.
  3. Community Engagement: Engage with the Python community through forums, meetups, conferences, and online platforms to learn, share experiences, and collaborate.
  4. Stay Curious: Python evolves continuously, and exploring new libraries, updates, or trends keeps your skills up-to-date and opens doors to new possibilities.
  5. Persistence: Embrace challenges as learning opportunities. Persistence and dedication in learning Python will yield rewarding results in the long run.

C. Final Thoughts:

Python is an exceptional programming language renowned for its simplicity, readability, and versatility. Its applications span across numerous fields, from web development to scientific computing, data analysis, machine learning, and beyond. Whether you’re a beginner starting your programming journey or an experienced developer seeking new avenues, Python offers a rich ecosystem and a supportive community to aid your exploration and growth.

PART 2: DATA SCIENCE NOV 2023

# NUMPY
# pip install numpy
import numpy as np
nums = range(16)
nums = np.reshape(nums,(8,2))
print(nums)
nums = np.reshape(nums,(4,4))
print(nums)
print(“Shape: Rows = “,nums.shape[0], “and columns = “,nums.shape[1])
# indexing
print(nums[1,2], nums[-3,-2])
print(nums[1]) # 2nd row
print(nums[:,1]) # : rows from 0th to (n-1)th
print(nums[-1], nums[:,-2], nums[-1,-2])

# to give your own set of values, you need to provide in terms of list
l1 = [[1,5,7],[2,4,9],[1,1,3],[3,3,2]]
# array is a function to convert list into numpy
mat1 = np.array(l1)
print(mat1)

print(np.zeros((3,3)))
print(np.ones((3,3)))
print(np.full((5,7),2.0))
print(np.full((5,7),9))

# eye – identity matrix: square matrix with 1 on its main diagonal
mat1 = np.eye(5)
print(mat1)

# NUMPY
import numpy as np
# to give your own set of values, you need to provide in terms of list
l1 = [[1,5,7],[2,4,9],[1,1,3],[3,3,2]]
# array is a function to convert list into numpy
mat1 = np.array(l1) # 4 * 3 – shape
print(mat1)
l2 = [[2,3,4],[2,1,2],[5,2,3],[3,2,2]]
# array is a function to convert list into numpy
mat2 = np.array(l2)
print(mat2)

# Matrices operations
print(mat1 + mat2)
print(np.add(mat1, mat2))

print(mat1 – mat2)
print(np.subtract(mat1, mat2))

print(mat1 * mat2)
print(np.multiply(mat1, mat2))

print(mat1 / mat2)
print(np.divide(mat1, mat2))

# actual matrix multiplication is done using matmul()
l3 = [[2,3,4],[2,1,2],[5,2,3]]
# array is a function to convert list into numpy
mat3 = np.array(l3)
print(mat3)
print(“Matrix Multiplication”)
print(np.matmul(mat1, mat3))
print(mat1 @ mat3)
## calculating determinant

l4 = [[1,3,5],[1,3,1],[2,3,4]]
mat5 = np.array(l4)
det_mat5 = np.linalg.det(mat5)
print(“Determinant of matrix 5 is”,det_mat5)
print(“Inverse of matrix 5 is: \n,np.linalg.inv(mat5))

”’
Linear Algebra Equation:
x1 + 5×2 = 7
-2×1 – 7×2 = -5

x1 = -8, x2= 3,
”’
coeff_mat = np.array([[1,5],[-2,-7]])
#var_mat = np.array([[x1],[x2]])
result_mat = np.array([[7],[-5]])
# equation here is coeff_mat * var_mat = result_mat [eg: 5 * x = 10]
# which is, var_mat = coeff_mat inv * result_mat
det_coeff_mat = np.linalg.det(coeff_mat)
if det_coeff_mat !=0:
var_mat = np.linalg.inv(coeff_mat) @ result_mat
print(“X1 = “,var_mat[0,0])
print(“X2 = “,var_mat[1,0])
else:
print(“Solution is not possible”)

# # scipy = scientific python
# pip install scipy
”’
#Inequality = OPTIMIZATION or MAXIMIZATION / MINIMIZATION PROBLEM
Computer Parts Assembly:
Laptops & Desktops
profit: 1000, 600
objective: either maximize profit or minimize cost

constraints:
1. Demand: 500, 600
2. Parts: Memory card: 5000 cards available
3. Manpower: 25000 minutes


”’

”’
Optimization using Scipy
let’s assume d = desktop, n = notebooks

Constraints:
1. d + n <= 10000
2. 2d + n <= 15000
3. 3d + 4n <= 25000

profit: 1000 d + 750 n => maximize
-1000d – 750 n =>minimize

”’
import numpy as np
from scipy.optimize import minimize, linprog
d = 1
n = 1
profit_d = 1000
profit_n = 750
profit = d * profit_d + n * profit_n
obj = [-profit_d, -profit_n]
lhs_con = [[1,1],[2,1],[3,4]]
rhs_con = [10000, 15000, 25000]

boundary = [(0, float(“inf”)), # boundary condition for # of desktops
(10, 200000)] # we just added some limit for notebooks
opt = linprog(c=obj, A_ub=lhs_con, b_ub=rhs_con, bounds=boundary, method=“revised simplex”)
print(opt)
if opt.success:
print(f”Number of desktops = {opt.x[0]} and number of laptops = {opt.x[1]})
print(“Maximum profit that can be generated = “,-1 * opt.fun)
else:
print(“Solution can not be generated”)

### ### ### PANDAS
# Pandas – dataframe which resembles Table structure
# pip install pandas
import pandas as pd
df1 = pd.DataFrame()
print(df1)
print(type(df1))

# fruit production
data = [[“Apple”, 15000, 11000,6000],
[“Banana”, 18000,22000,29000],
[“Mango”, 2, 900, 19000],
[“Guava”, 19000,11000,25000]]

fruit_production = pd.DataFrame(data)
print(fruit_production)
print(“Slicing 1:\n)
print(fruit_production.iloc[1:3,2:]) #based on index
print(“Slicing 2:\n)
print(fruit_production.loc[1:3,2:]) #based on title(names)

fruit_production = pd.DataFrame(data,
columns=[“Fruits”,“January”,“February”,“March”])
print(fruit_production)

fruit_production = pd.DataFrame(data,
columns=[“Fruits”,“January”,“February”,“March”],
index=[“Fruit 1”,“Fruit 2”,“Fruit 3”,“Fruit 4”])
print(fruit_production)

## dataframe.loc() dataframe.iloc()

print(“Slicing 1:\n)
print(fruit_production.iloc[1:3,2:]) #based on index
print(“Slicing 2:\n)
print(fruit_production.loc[[“Fruit 2”, “Fruit 3”],[“February”,“March”]]) #based on title(names)

### ###

# pandas
# pip install pandas
import pandas as pd
l1 = [10,20,30,40,50]
l1 = [[“Sachin”,101,20000,“BATSMAN”],[“Kapil”,501,12000,“BOWLER”],
[“Sunil”,12,21000,“BATSMAN”],[“Zaheer”,725,2000,“BOWLER”]]
df1 = pd.DataFrame(l1,columns=[“Player”,“Wickets”,“Runs”,“Type”],
index=[“Player 1”,“Player 2”,“Player 3”,“Player 4”])
print(df1)

d1 = {‘Apple’:[12000,11000,13000],
‘Banana’: [17000,18000,19000],
‘Mango’:[11000,13000,15000]}
df2 = pd.DataFrame(d1)
print(df2)

# creating dataframe from list of dictionary
data1 = [{“Guava”:9000, “Oranges”: 5000},
{“Guava”:8000, “Oranges”: 7000},
{“Guava”:10000, “Oranges”: 6000}]
df3 = pd.DataFrame(data1)
print(df3)

print(df3.iloc[0,:]) #first row and all column values
print(df3.iloc[:,0])

print(df2.iloc[:,0:2])
print(df2.iloc[[0,2],[0,2]])

#
print(df2.loc[[0,2],[“Apple”,“Mango”]])
print(df1.loc[[“Player 1”,“Player 4”],[“Player”,“Runs”]])

df2.iloc[2,0] = 14000
print(df2)
print(“========= DF1 =============”)
df1[‘Avg’] = df1[‘Runs’] / df1[“Wickets”]
print(df1)
print(“Reading data from DF1: “)
df4 = df1[df1.Player !=‘Sachin’] #filter where clause
print(\n\n New dataset without Sachin: \n, df4)
df1 = df1.drop(“Player”,axis=1) # axis default is 0
# unlike pop() and del – drop() returns a new dataframe
print(df1)


print(“Average Wickets of all the players = “,df1[‘Wickets’].mean())
print(“Average Wickets of players by type = \n\n,df1.groupby(‘Type’).mean())
# axis = 0 refers to rows
# axis = 1 refers to columns

print(\n\nDropping columns from DF1: “)
del df1[‘Wickets’] #dropping column Wickets using del
print(df1)

df1.pop(‘Runs’) #dropping column using pop
print(df1)
#

import pandas as pd

ud_df = pd.read_csv(“D:/datasets/gitdataset/user_device.csv”)
print(ud_df) # 272 rows x 6 columns
print(“Rows: “,ud_df.shape[0])
print(“Columns: “,ud_df.shape[1])

print(ud_df.tail(1))
print(ud_df.head(1))

use_df = pd.read_csv(“D:/datasets/gitdataset/user_usage.csv”)
print(use_df) # 240 rows x 4 columns

result_df = pd.merge(use_df[[‘use_id’,‘monthly_mb’,‘outgoing_sms_per_month’,
‘outgoing_mins_per_month’]], ud_df,
on=‘use_id’)
print(result_df) # [159 rows x 9 columns] = ud_df: 159 + 113, use_df = 159 + 81

result_df = pd.merge(use_df[[‘use_id’,‘monthly_mb’,‘outgoing_sms_per_month’,
‘outgoing_mins_per_month’]], ud_df,
on=‘use_id’, how=‘outer’)
print(result_df)

result_df = pd.merge(use_df[[‘use_id’,‘monthly_mb’,‘outgoing_sms_per_month’,
‘outgoing_mins_per_month’]], ud_df,
on=‘use_id’, how=‘left’)
print(result_df)

result_df = pd.merge(use_df[[‘use_id’,‘monthly_mb’,‘outgoing_sms_per_month’,
‘outgoing_mins_per_month’]], ud_df,
on=‘use_id’, how=‘right’)
print(result_df)

## Working with Pandas – Example ##
import pandas as pd
import numpy as np
df = pd.read_csv(“D:/datasets/gitdataset/hotel_bookings.csv”)
print(df.shape)
print(df.dtypes)
”’
numeric – int, float
categorical – 1) Nominal – there is no order 2) Ordinal – here order is imp
”’
df_numeric = df.select_dtypes(include=[np.number])
print(df_numeric)

df_object= df.select_dtypes(exclude=[np.number])
print(df_object) # categorical and date columns

print(df.columns)
for col in df.columns:
missing = np.mean(df[col].isnull())
if missing >0:
print(f”{col}{missing})

”’
Phases:
1. Business objective
2. Collect the relevant data
3. Preprocessing – making data ready for use
a. Handle missing values
b. Feature scaling – scale the values in the column to similar range
c. Outliers / data correction
d. handling categorical data:
i. Encode the data to convert text to number
East = 0, North = 1, South = 2, West = 3
ii. Column Transform into multple columns
iii. Delete any one column
4. EDA- Exploratory Data Analysis: to understand the data
5. MODEL BUILDING – Divide the train and test


”’
import pandas as pd
df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/1_Data_PreProcessing.csv”)
print(df)

Phases:
1. Business objective
2. Collect the relevant data
3. Preprocessing – making data ready for use
a. Handle missing values
b. Feature scaling – scale the values in the column to similar range
c. Outliers / data correction
d. handling categorical data:
i. Encode the data to convert text to number
East = 0, North = 1, South = 2, West = 3
ii. Column Transform into multple columns
iii. Delete any one column
4. EDA- Exploratory Data Analysis: to understand the data
5. MODEL BUILDING –
a. Divide the train and test
b. Run the model
6. EVALUATE THE MODEL:
a. Measure the performance of each algorithm on the test data
b. Metric to compare: based on Regression (MSE, RMSE, R square) or
classification (confusion matrix -accuracy, sensitivity..)
c. select the best performing model
7. DEPLOY THE BEST PERFORMING MODEL

Hypothesis test:
1. Null Hypothesis (H0): starting statement (objective)
Alternate Hypethesis (H1): Alternate of H0

Z or T test:
Chi square test: both are categorical

e.g. North zone: 50 WIN 5 LOSS – p = 0.005

# simple (single value) v composite (specifies range)
# two tailed test v one tailed test [H0: mean = 0,
H1 Left Tailed: mean <0
H1 Right Tailed: mean >0
# level of significance:
alpha value: confidence interval – 95%
p value: p value <0.05 – we reject Null Hypothesis

import pandas as pd
df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/1_Data_PreProcessing.csv”)
X = df.iloc[:,:3].values
y = df.iloc[:,3].values
#print(“X: \n”)
#print(X)
#print(“Y: \n”)
#print(y)

# scikit-learn package to perform ML
# install the package by: pip install scikit-learn
# but when you import, its sklearn

# Complete tutorial on sklearn:
# https://scikit-learn.org/stable/

# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=‘mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)

# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0])
print(X)

import pandas as pd
df = pd.read_csv(“https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/1_Data_PreProcessing.csv”)
X = df.iloc[:,:3].values
y = df.iloc[:,3].values
#print(“X: \n”)
#print(X)
#print(“Y: \n”)
#print(y)

# scikit-learn package to perform ML
# install the package by: pip install scikit-learn
# but when you import, its sklearn

# Complete tutorial on sklearn:
# https://scikit-learn.org/stable/

# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=‘mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)

# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[0])],remainder=‘passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
#print(X)

# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
print(X_train)
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)

”’
Regression: Output (Marks) is a continous variable
Algorithm: Simple (as it has only 1 X column) Linear (assuming that dataset is linear) Regression
X – independent variable(s)
Y – dependent variable
”’
import pandas as pd
import matplotlib.pyplot as plt
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/2_Marks_Data.csv”
df = pd.read_csv(link)
X = df.iloc[:,:1].values
y = df.iloc[:,1].values

”’
# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=’mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)

# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[0])],remainder=’passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
#print(X)
”’

# EDA – Exploratory Data Analysis
plt.scatter(x=df[‘Hours’],y=df[‘Marks’])
plt.show()
”’
Scatter plots – shows relationship between X and Y variables. You can have:
1. Positive correlation:
2. Negative correlation:
3. No Correlation
4. Correlation: 0 to +/- 1
5. Correlation value: 0 to +/- 0.5 : no correlation
6. Strong correlation value will be closer to +/- 1
7. Equation: straight line => y = mx + c
”’
# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=100)
print(X_train)

”’
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)
”’

## RUN THE MODEL
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
# fit – train the model
regressor.fit(X_train, y_train)
print(f”M/Coefficient/Slope = {regressor.coef_} and the Constant = {regressor.intercept_})

# y = 7.5709072 X + 20.1999196152844
# M/Coefficient/Slope = [7.49202113] and the Constant = 21.593606679699406

y_pred = regressor.predict(X_test)
result_df =pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)

# Analyze the output
”’
Regression: Output (Marks) is a continous variable
Algorithm: Simple (as it has only 1 X column) Linear (assuming that dataset is linear) Regression
X – independent variable(s)
Y – dependent variable
”’
import pandas as pd
import matplotlib.pyplot as plt
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/2_Marks_Data.csv”
df = pd.read_csv(link)
X = df.iloc[:,:1].values
y = df.iloc[:,1].values

”’
# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=’mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)

# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[0])],remainder=’passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
#print(X)
”’

# EDA – Exploratory Data Analysis
plt.scatter(x=df[‘Hours’],y=df[‘Marks’])
plt.show()
”’
Scatter plots – shows relationship between X and Y variables. You can have:
1. Positive correlation:
2. Negative correlation:
3. No Correlation
4. Correlation: 0 to +/- 1
5. Correlation value: 0 to +/- 0.5 : no correlation
6. Strong correlation value will be closer to +/- 1
7. Equation: straight line => y = mx + c
”’
# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=100)
print(X_train)

”’
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)
”’

## RUN THE MODEL
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
# fit – train the model
regressor.fit(X_train, y_train)
print(f”M/Coefficient/Slope = {regressor.coef_} and the Constant = {regressor.intercept_})

# y = 7.5709072 X + 20.1999196152844
# M/Coefficient/Slope = [7.49202113] and the Constant = 21.593606679699406

y_pred = regressor.predict(X_test)
result_df =pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)

# Analyze the output
from sklearn import metrics
mse = metrics.mean_squared_error(y_true=y_test, y_pred=y_pred)
print(“Root Mean Squared Error (Variance) = “,mse**0.5)
mae = metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred)
print(“Mean Absolute Error = “,mae)
print(“R Square is (Variance)”,metrics.r2_score(y_test, y_pred))

## Bias is based on training data
y_pred_tr = regressor.predict(X_train)
mse = metrics.mean_squared_error(y_true=y_train, y_pred=y_pred_tr)
print(“Root Mean Squared Error (Bias) = “,mse**0.5)
print(“R Square is (Bias)”,metrics.r2_score(y_train, y_pred_tr))
## Bias v Variance

import pandas as pd
import matplotlib.pyplot as plt
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”
df = pd.read_csv(link)
print(df.describe())
X = df.iloc[:,:4].values
y = df.iloc[:,4].values

”’
# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=’mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)
”’
# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,3] = lc.fit_transform(X[:,3])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[3])],remainder=‘passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
print(X)


# EDA – Exploratory Data Analysis
plt.scatter(x=df[‘Administration’],y=df[‘Profit’])
plt.show()
plt.scatter(x=df[‘R&D Spend’],y=df[‘Profit’])
plt.show()
plt.scatter(x=df[‘Marketing Spend’],y=df[‘Profit’])
plt.show()

# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=100)
print(X_train)

”’
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)
”’


## RUN THE MODEL
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
# fit – train the model
regressor.fit(X_train, y_train)
print(f”M/Coefficient/Slope = {regressor.coef_} and the Constant = {regressor.intercept_})

# y = -3791.2 x Florida -3090.1 x California + 0.82 R&D – 0.05 Admin + 0.022 Marketing+ 56650


y_pred = regressor.predict(X_test)
result_df =pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)

# Analyze the output
from sklearn import metrics
mse = metrics.mean_squared_error(y_true=y_test, y_pred=y_pred)
print(“Root Mean Squared Error (Variance) = “,mse**0.5)
mae = metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred)
print(“Mean Absolute Error = “,mae)
print(“R Square is (Variance)”,metrics.r2_score(y_test, y_pred))

## Bias is based on training data
y_pred_tr = regressor.predict(X_train)
mse = metrics.mean_squared_error(y_true=y_train, y_pred=y_pred_tr)
print(“Root Mean Squared Error (Bias) = “,mse**0.5)
print(“R Square is (Bias)”,metrics.r2_score(y_train, y_pred_tr))

”’
Case 1: All the columns are taken into account:
Mean Absolute Error = 8696.887641252619
R Square is (Variance) 0.884599945166969
Root Mean Squared Error (Bias) = 7562.5657508560125
R Square is (Bias) 0.9624157828452926
”’
## Testing

import statsmodels.api as sm
import numpy as np
X = np.array(X, dtype=float)
print(“Y:\n,y)
summ1 = sm.OLS(y,X).fit().summary()
print(“Summary of All X \n—————-\n:”,summ1)

import pandas as pd
import matplotlib.pyplot as plt
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”
df = pd.read_csv(link)
print(df.describe())
X = df.iloc[:,:4].values
y = df.iloc[:,4].values

”’
# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=’mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)
”’
# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,3] = lc.fit_transform(X[:,3])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[3])],remainder=‘passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
print(X)

”’
After doing Backward elemination method we realized that all the state columns
are not significantly impacting the analysis hence removing those 2 columns too.
”’
X = X[:,2:] # after backward elemination

# EDA – Exploratory Data Analysis
plt.scatter(x=df[‘Administration’],y=df[‘Profit’])
plt.show()
plt.scatter(x=df[‘R&D Spend’],y=df[‘Profit’])
plt.show()
plt.scatter(x=df[‘Marketing Spend’],y=df[‘Profit’])
plt.show()

# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=100)
print(X_train)

”’
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)
”’


## RUN THE MODEL
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
# fit – train the model
regressor.fit(X_train, y_train)
print(f”M/Coefficient/Slope = {regressor.coef_} and the Constant = {regressor.intercept_})

# y = -3791.2 x Florida -3090.1 x California + 0.82 R&D – 0.05 Admin + 0.022 Marketing+ 56650


y_pred = regressor.predict(X_test)
result_df =pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)

# Analyze the output
from sklearn import metrics
mse = metrics.mean_squared_error(y_true=y_test, y_pred=y_pred)
print(“Root Mean Squared Error (Variance) = “,mse**0.5)
mae = metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred)
print(“Mean Absolute Error = “,mae)
print(“R Square is (Variance)”,metrics.r2_score(y_test, y_pred))

## Bias is based on training data
y_pred_tr = regressor.predict(X_train)
mse = metrics.mean_squared_error(y_true=y_train, y_pred=y_pred_tr)
print(“Root Mean Squared Error (Bias) = “,mse**0.5)
print(“R Square is (Bias)”,metrics.r2_score(y_train, y_pred_tr))

”’
Case 1: All the columns are taken into account:
Mean Absolute Error = 8696.887641252619
R Square is (Variance) 0.884599945166969
Root Mean Squared Error (Bias) = 7562.5657508560125
R Square is (Bias) 0.9624157828452926
”’
## Testing

import statsmodels.api as sm
import numpy as np
X = np.array(X, dtype=float)
#X = X[:,[2,3,4]]
print(“Y:\n,y)
summ1 = sm.OLS(y,X).fit().summary()
print(“Summary of All X \n—————-\n:”,summ1)

## Test for linearity
# 1. All features (X) should be correlated to Y
# 2. Multicollinearity: Within X there should not be any correlation,
# if its there then take any one for the analysis

import pandas as pd
import matplotlib.pyplot as plt
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/4_Position_Salaries.csv”
df = pd.read_csv(link)
print(df.describe())
X = df.iloc[:,1:2].values
y = df.iloc[:,2].values

”’
# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=’mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)
”’
”’
# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,3] = lc.fit_transform(X[:,3])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[3])],remainder=’passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
print(X)

”’
”’
After doing Backward elemination method we realized that all the state columns
are not significantly impacting the analysis hence removing those 2 columns too.

X = X[:,2:] # after backward elemination
”’
”’
# EDA – Exploratory Data Analysis
plt.scatter(x=df[‘Level’],y=df[‘Salary’])
plt.show()
”’

# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=100)
print(X_train)

from sklearn.linear_model import LinearRegression
from sklearn import metrics
”’
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)
”’
”’
#Since dataset is too small, lets take entire data for training
X_train, y_train = X,y
X_test, y_test = X,y
”’
”’
## RUN THE MODEL

regressor = LinearRegression()
# fit – train the model
regressor.fit(X_train, y_train)
print(f”M/Coefficient/Slope = {regressor.coef_} and the Constant = {regressor.intercept_}”)

# y =
y_pred = regressor.predict(X_test)
result_df =pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)

# Analyze the output

mse = metrics.mean_squared_error(y_true=y_test, y_pred=y_pred)
print(“Root Mean Squared Error (Variance) = “,mse**0.5)
mae = metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred)
print(“Mean Absolute Error = “,mae)
print(“R Square is (Variance)”,metrics.r2_score(y_test, y_pred))

## Bias is based on training data
y_pred_tr = regressor.predict(X_train)
mse = metrics.mean_squared_error(y_true=y_train, y_pred=y_pred_tr)
print(“Root Mean Squared Error (Bias) = “,mse**0.5)
print(“R Square is (Bias)”,metrics.r2_score(y_train, y_pred_tr))

# Plotting the data for output
plt.scatter(x=df[‘Level’],y=df[‘Salary’])
plt.plot(X,y_pred)
plt.xlabel(“Level”)
plt.ylabel(“Salary”)
plt.show()
”’

# 3. Model – Polynomial regression analysis
# y = C + m1 * X + m2 * x square
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

for i in range(1,10):
#prepare the parameters
parameters = [(‘polynomial’, PolynomialFeatures(degree=i)),(‘modal’,LinearRegression())]
pipe = Pipeline(parameters)
pipe.fit(X_train,y_train)
y_pred = pipe.predict(X)
## Bias is based on training data
y_pred_tr = pipe.predict(X_train)
mse = metrics.mean_squared_error(y_true=y_train, y_pred=y_pred_tr)
rmse_tr = mse ** 0.5
print(“Root Mean Squared Error (Bias) = “,rmse_tr)
print(“R Square is (Bias)”,metrics.r2_score(y_train, y_pred_tr))

## Variance is based on validation data
y_pred_tt = pipe.predict(X_test)
mse = metrics.mean_squared_error(y_true=y_test, y_pred=y_pred_tt)
rmse_tt = mse ** 0.5
print(“Root Mean Squared Error (Variance) = “, rmse_tt)
print(“R Square is (Variance)”, metrics.r2_score(y_test, y_pred_tt))
print(“Difference Between variance and bias = “,rmse_tt – rmse_tr)
# Plotting the data for output
plt.scatter(x=df[‘Level’],y=df[‘Salary’])
plt.plot(X,y_pred)
plt.title(“Polynomial Analysis degree =”+str(i))
plt.xlabel(“Level”)
plt.ylabel(“Salary”)
plt.show()

import pandas as pd
import matplotlib.pyplot as plt
#link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/4_Position_Salaries.csv”
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”
df = pd.read_csv(link)
print(df.describe())
X = df.iloc[:,0:4].values
y = df.iloc[:,4].values

”’
# 1. Replace the missing values with mean value
from sklearn.impute import SimpleImputer
import numpy as np
imputer = SimpleImputer(missing_values=np.nan, strategy=’mean’)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#print(X)
”’

# 2. Handling categorical values
# encoding
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lc = LabelEncoder()
X[:,3] = lc.fit_transform(X[:,3])

from sklearn.compose import ColumnTransformer
transform = ColumnTransformer([(‘one_hot_encoder’, OneHotEncoder(),[3])],remainder=‘passthrough’)
X=transform.fit_transform(X)
X = X[:,1:] # dropped one column
print(X)


”’
After doing Backward elemination method we realized that all the state columns
are not significantly impacting the analysis hence removing those 2 columns too.

X = X[:,2:] # after backward elemination
”’
”’
# EDA – Exploratory Data Analysis
plt.scatter(x=df[‘Level’],y=df[‘Salary’])
plt.show()
”’

# 3. splitting it into train and test test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=100)
print(X_train)

from sklearn.linear_model import LinearRegression
from sklearn import metrics
”’
# 4. Scaling / Normalization
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train[:,3:])
X_test = scale.fit_transform(X_test[:,3:])
print(X_train)
”’
”’
#Since dataset is too small, lets take entire data for training
X_train, y_train = X,y
X_test, y_test = X,y
”’

## RUN THE MODEL – Support Vector Machine Regressor (SVR)
from sklearn.svm import SVR
#regressor = SVR(kernel=’linear’)
#regressor = SVR(kernel=’poly’,degree=2,C=10)
# Assignment – Best value for gamma: 0.01 to 1 (0.05)
regressor = SVR(kernel=“rbf”,gamma=0.1,C=10)
# fit – train the model
regressor.fit(X_train, y_train)


# y =
y_pred = regressor.predict(X_test)
result_df =pd.DataFrame({‘Actual’: y_test, ‘Predicted’: y_pred})
print(result_df)

# Analyze the output

mse = metrics.mean_squared_error(y_true=y_test, y_pred=y_pred)
print(“Root Mean Squared Error (Variance) = “,mse**0.5)
mae = metrics.mean_absolute_error(y_true=y_test, y_pred=y_pred)
print(“Mean Absolute Error = “,mae)
print(“R Square is (Variance)”,metrics.r2_score(y_test, y_pred))

## Bias is based on training data
y_pred_tr = regressor.predict(X_train)
mse = metrics.mean_squared_error(y_true=y_train, y_pred=y_pred_tr)
print(“Root Mean Squared Error (Bias) = “,mse**0.5)
print(“R Square is (Bias)”,metrics.r2_score(y_train, y_pred_tr))


# Plotting the data for output
plt.scatter(X_train[:,2],y_pred_tr)
#plt.plot(X_train[:,2],y_pred_tr)
plt.show()

#Decision Tree & Random Forest
import pandas as pd
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/3_Startups.csv”
link = “D:\\datasets\\3_Startups.csv”
df = pd.read_csv(link)
print(df)

#X = df.iloc[:,:4].values
X = df.iloc[:,:1].values
y = df.iloc[:,:-1].values
from sklearn.model_selection import train_test_split
X_train, X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=100)

from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor()
regressor.fit(X_train,y_train)
y_pred = regressor.predict(X_test)

# Baging, Boosting, Ensemble
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=10)
regressor.fit(X_train,y_train)
y_pred = regressor.predict(X_test)

## Assignment these algorithms and check the RMSE and R square values

# Ridge Lasso Elasticnet
import pandas as pd
link=“https://raw.githubusercontent.com/swapnilsaurav/Dataset/master/student_scores_multi.csv”
df = pd.read_csv(link)
print(df)
X = df.iloc[:,0:3].values
y = df.iloc[:,3].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.85, random_state=100)

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
lr_ridge = Ridge(alpha=0.8)
lr_ridge.fit(X_train,y_train)
y_ridge_pred = lr_ridge.predict(X_test)

from sklearn.metrics import r2_score
r2_ridge_test = r2_score(y_test, y_ridge_pred)

y_ridge_pred_tr = lr_ridge.predict(X_train)
r2_ridge_train = r2_score(y_train, y_ridge_pred_tr)
print(f”Ridge Regression: Train R2 = {r2_ridge_train} and Test R2={r2_

# Classifications algorithm: supervised algo which predicts the class
”’
classifier: algorithm that we develop
model: training and predicting the outcome
features: the input data (columns)
target: class that we need to predict
classification: binary (2 class outcome) or multiclass (more than 2 classes)

Steps to run the model:
1. get the data
2. preprocess the data
3. eda
4. train the model
5. predict the model
6. evaluate the model

”’
#1. Logistic regression
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv”
import pandas as pd
df = pd.read_csv(link)
X = df.iloc[:,1:4].values
y = df.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0] )

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=100)

# Scaling as Age and Salary are in different range of values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

## Build the model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)

# visualize the outcome
X_train = X_train[:,1:]
X_test = X_test[:,1:]
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
x_set, y_set = X_train, y_train
X1,X2 = np.meshgrid(np.arange(start = x_set[:,0].min()-1, stop=x_set[:,0].max()+1, step=0.01),
np.arange(start = x_set[:,1].min()-1, stop=x_set[:,1].max()+1, step=0.01))
plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
cmap=ListedColormap((‘red’,‘green’)))
plt.show()

https://designrr.page/?id=155238&token=545210681&type=FP&h=7849

# Classifications algorithm: supervised algo which predicts the class
”’
classifier: algorithm that we develop
model: training and predicting the outcome
features: the input data (columns)
target: class that we need to predict
classification: binary (2 class outcome) or multiclass (more than 2 classes)

Steps to run the model:
1. get the data
2. preprocess the data
3. eda
4. train the model
5. predict the model
6. evaluate the model

”’
#1. Logistic regression
link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv”
link = “D:\\datasets\\5_Ads_Success.csv”
import pandas as pd
df = pd.read_csv(link)
X = df.iloc[:,1:4].values
y = df.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0] )

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=100)

# Scaling as Age and Salary are in different range of values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

## Build the model
”’
## LOGISTIC REGRESSION
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
”’
from sklearn.svm import SVC
”’
## Support Vector Machine – Classifier
classifier = SVC(kernel=’linear’)

classifier = SVC(kernel=’rbf’,gamma=100, C=100)
”’
from sklearn.neighbors import KNeighborsClassifier
## Refer types of distances:
# https://designrr.page/?id=200944&token=2785938662&type=FP&h=7229

classifier = KNeighborsClassifier(n_neighbors=9, metric=‘minkowski’)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)

# visualize the outcome
X_train = X_train[:,1:]
X_test = X_test[:,1:]
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
x_set, y_set = X_train, y_train
X1,X2 = np.meshgrid(np.arange(start = x_set[:,0].min()-1, stop=x_set[:,0].max()+1, step=0.01),
np.arange(start = x_set[:,1].min()-1, stop=x_set[:,1].max()+1, step=0.01))
plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
cmap=ListedColormap((‘red’,‘green’)))

#Now we will plot training data
for i, j in enumerate(np.unique(y_set)):
plt.scatter(x_set[y_set==j,0],
x_set[y_set==j,1], color=ListedColormap((“red”,“green”))(i),
label=j)
plt.show()

## Model Evaluation using Confusion Matrix
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(“Confusion Matrix: \n,cm)
cr = classification_report(y_test, y_pred)
accs = accuracy_score(y_test, y_pred)
print(“classification_report: \n,cr)
print(“accuracy_score: “,accs)

import sklearn.tree

link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv”
link = “D:\\datasets\\5_Ads_Success.csv”
import pandas as pd
df = pd.read_csv(link)
X = df.iloc[:,1:4].values
y = df.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0] )

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=100)

# Scaling as Age and Salary are in different range of values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

## Build the model
”’
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion=”gini”)
”’
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=39, criterion=“gini”)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)

# visualize the outcome
X_train = X_train[:,1:]
X_test = X_test[:,1:]
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
x_set, y_set = X_train, y_train
X1,X2 = np.meshgrid(np.arange(start = x_set[:,0].min()-1, stop=x_set[:,0].max()+1, step=0.01),
np.arange(start = x_set[:,1].min()-1, stop=x_set[:,1].max()+1, step=0.01))
plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
cmap=ListedColormap((‘red’,‘green’)))

#Now we will plot training data
for i, j in enumerate(np.unique(y_set)):
plt.scatter(x_set[y_set==j,0],
x_set[y_set==j,1], color=ListedColormap((“red”,“green”))(i),
label=j)
plt.show()

## Model Evaluation using Confusion Matrix
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(“Confusion Matrix: \n,cm)
cr = classification_report(y_test, y_pred)
accs = accuracy_score(y_test, y_pred)
print(“classification_report: \n,cr)
print(“accuracy_score: “,accs)

”’
# Show decision tree created

output = sklearn.tree.export_text(classifier)
print(output)
# visualize the tree
fig = plt.figure(figsize=(40,60))
tree_plot = sklearn.tree.plot_tree(classifier)
plt.show()
”’

”’
In Ensemble Algorithms – we run multiple algorithms to improve the performance
of a given business objective:
1. Boosting: When you run same algorithm – Input varies based on weights
2. Bagging: When you run same algorithm – average of all
3. Stacking: Over different algorithms – average of all
”’

import sklearn.tree

link = “https://raw.githubusercontent.com/swapnilsaurav/MachineLearning/master/5_Ads_Success.csv”
link = “D:\\datasets\\5_Ads_Success.csv”
import pandas as pd
df = pd.read_csv(link)
X = df.iloc[:,1:4].values
y = df.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder
lc = LabelEncoder()
X[:,0] = lc.fit_transform(X[:,0] )

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=100)

# Scaling as Age and Salary are in different range of values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

## Build the model
”’
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion=”gini”)

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=39, criterion=”gini”)
”’
from sklearn.ensemble import AdaBoostClassifier
classifier = AdaBoostClassifier(n_estimators=7)
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)

# visualize the outcome
X_train = X_train[:,1:]
X_test = X_test[:,1:]
classifier.fit(X_train,y_train)
y_pred = classifier.predict(X_test)
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
x_set, y_set = X_train, y_train
X1,X2 = np.meshgrid(np.arange(start = x_set[:,0].min()-1, stop=x_set[:,0].max()+1, step=0.01),
np.arange(start = x_set[:,1].min()-1, stop=x_set[:,1].max()+1, step=0.01))
plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
cmap=ListedColormap((‘red’,‘green’)))

#Now we will plot training data
for i, j in enumerate(np.unique(y_set)):
plt.scatter(x_set[y_set==j,0],
x_set[y_set==j,1], color=ListedColormap((“red”,“green”))(i),
label=j)
plt.show()

## Model Evaluation using Confusion Matrix
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(“Confusion Matrix: \n,cm)
cr = classification_report(y_test, y_pred)
accs = accuracy_score(y_test, y_pred)
print(“classification_report: \n,cr)
print(“accuracy_score: “,accs)

”’
# Show decision tree created

output = sklearn.tree.export_text(classifier)
print(output)
# visualize the tree
fig = plt.figure(figsize=(40,60))
tree_plot = sklearn.tree.plot_tree(classifier)
plt.show()
”’

”’
In Ensemble Algorithms – we run multiple algorithms to improve the performance
of a given business objective:
1. Boosting: When you run same algorithm – Input varies based on weights
2. Bagging: When you run same algorithm – average of all
3. Stacking: Over different algorithms – average of all
”’

https://designrr.page/?id=36743&token=2022711066&type=FP&h=3547

 

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

 

X,y = make_blobs(n_samples=300, n_features=3, centers=4)
plt.scatter(X[:,0], X[:,1])
plt.show()

 

from sklearn.cluster import KMeans
km = KMeans(n_clusters=5, init=“random”,max_iter=100)
y_cluster =km.fit_predict(X)

 

plt.scatter(X[y_cluster==0,0],X[y_cluster==0,1],c=“blue”,label=“Cluster A”)
plt.scatter(X[y_cluster==1,0],X[y_cluster==1,1],c=“red”,label=“Cluster B”)
plt.scatter(X[y_cluster==2,0],X[y_cluster==2,1],c=“green”,label=“Cluster C”)
plt.scatter(X[y_cluster==3,0],X[y_cluster==3,1],c=“black”,label=“Cluster D”)
plt.scatter(X[y_cluster==4,0],X[y_cluster==4,1],c=“orange”,label=“Cluster E”)
plt.show()

 

distortion = []
max_centers = 30
for i in range(1,max_centers):
km = KMeans(n_clusters=i, init=“random”, max_iter=100)
y_cluster = km.fit(X)
distortion.append(km.inertia_)

 

print(“Distortion:\n,distortion)
plt.plot(range(1,max_centers),distortion,marker=“o”)
plt.show()

 

import pandas as pd
import matplotlib.pyplot as plt
link = “D:\\Datasets\\USArrests.csv”
df = pd.read_csv(link)
#print(df)
X = df.iloc[:,1:]
from sklearn.preprocessing import normalize
data = normalize(X)
data = pd.DataFrame(data)
print(data)

## plotting dendogram
import scipy.cluster.hierarchy as sch
dendo = sch.dendrogram(sch.linkage(data, method=‘ward’))
plt.axhline(y=0.7,color=“red”)
plt.show()

link = “D:\\datasets\\Market_Basket_Optimisation.csv”
import pandas as pd
df = pd.read_csv(link)
print(df)
from apyori import apriori
transactions = []
for i in range(len(df)):
if i%100==0:
print(“I = “,i)
transactions.append([str(df.values[i,j]) for j in range(20)])

## remove nan from the list
print(“Transactions:\n,transactions)

association_algo = apriori(transactions, min_confidence=0.2, min_support=0.02, min_lift=2)
print(“Association = “,list(association_algo))

”’
Time Series Forecasting – ARIMA method

1. Read and visualize the data
2. Stationary series
3. Optimal parameters
4. Build the model
5. Prediction
”’
import pandas as pd
#Step 1: read the data
link = “D:\\datasets\\gitdataset\\AirPassengers.csv”
air_passengers = pd.read_csv(link)

”’
#Step 2: visualize the data
import plotly.express as pe
fig = pe.line(air_passengers,x=”Month”,y=”#Passengers”)
fig.show()
”’
# Cleaning the data
from datetime import datetime
air_passengers[‘Month’] = pd.to_datetime(air_passengers[‘Month’])
air_passengers.set_index(‘Month’,inplace=True)

#converting to time series data
import numpy as np
ts_log = np.log(air_passengers[‘#Passengers’])
#creating rolling period – 12 months
import matplotlib.pyplot as plt
”’
moving_avg = ts_log.rolling(12).mean
plt.plot(ts_log)
plt.plot(moving_avg)
plt.show()
”’
#Step 3: Decomposition into: trend, seasonality, error ( or residual or noise)
”’
Additive decomposition: linear combination of above 3 factors:
Y(t) =T(t) + S(t) + E(t)

Multiplicative decomposition: product of 3 factors:
Y(t) =T(t) * S(t) * E(t)
”’
from statsmodels.tsa.seasonal import seasonal_decompose
decomposed = seasonal_decompose(ts_log,model=“multiplicative”)
decomposed.plot()
plt.show()

# Step 4: Stationary test
”’
To make Time series analysis, the TS should be stationary.
A time series is said to be stationary if its statistical properties
(mean, variance, autocorrelation) doesnt change by a large value
over a period of time.
Types of tests:
1. Augmented Dickey Fuller test (ADH Test)
2. Kwiatkowski Phillips Schnidt Shin (KPSS) test
3. Phillips Perron (PP) Test

Null Hypothesis: The time series is not stationary
Alternate Hypothesis: Time series is stationary
If p >0.05 we reject Null Hypothesis
”’
from statsmodels.tsa.stattools import adfuller
result = adfuller(air_passengers[‘#Passengers’])
print(“ADF Stats: \n,result[0])
print(“p value = “,result[1])
”’
To reject Null hypothesis, result[0] less than 5% critical region value
and p > 0.05
”’

# Run the model
”’
ARIMA model: Auto-Regressive Integrative Moving Average
AR: p predicts the current value
I: d integrative by removing trend and seasonality component from previous period
MA: q represents Moving Average

AIC- Akaike’s Information Criterion (AIC) – helps to find optimal p,d,q values
BIC – Bayesian Information Criterion (BIC) – alternative to AIC
”’
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(air_passengers[‘#Passengers’].diff().dropna())
plot_pacf(air_passengers[‘#Passengers’].diff().dropna())
plt.show()
”’
How to read above graph:
To find q (MA), we look at the Autocorrelation graph and see where there is a drastic change:
here, its at 1, so q = 1 (or 2 as at 2, it goes to -ve)

To find p (AR) – sharp drop in Partial Autocorrelation graph:
here, its at 1, so p = 1 (or 2 as at 2, it goes to -ve)

for d (I) – we need to try with multiple values
intially we will take as 1

”’
”’
Time Series Forecasting – ARIMA method

1. Read and visualize the data
2. Stationary series
3. Optimal parameters
4. Build the model
5. Prediction
”’
import pandas as pd
#Step 1: read the data
link = “D:\\datasets\\gitdataset\\AirPassengers.csv”
air_passengers = pd.read_csv(link)

”’
#Step 2: visualize the data
import plotly.express as pe
fig = pe.line(air_passengers,x=”Month”,y=”#Passengers”)
fig.show()
”’
# Cleaning the data
from datetime import datetime
air_passengers[‘Month’] = pd.to_datetime(air_passengers[‘Month’])
air_passengers.set_index(‘Month’,inplace=True)

#converting to time series data
import numpy as np
ts_log = np.log(air_passengers[‘#Passengers’])
#creating rolling period – 12 months
import matplotlib.pyplot as plt
”’
moving_avg = ts_log.rolling(12).mean
plt.plot(ts_log)
plt.plot(moving_avg)
plt.show()
”’
#Step 3: Decomposition into: trend, seasonality, error ( or residual or noise)
”’
Additive decomposition: linear combination of above 3 factors:
Y(t) =T(t) + S(t) + E(t)

Multiplicative decomposition: product of 3 factors:
Y(t) =T(t) * S(t) * E(t)
”’
from statsmodels.tsa.seasonal import seasonal_decompose
decomposed = seasonal_decompose(ts_log,model=“multiplicative”)
decomposed.plot()
plt.show()

# Step 4: Stationary test
”’
To make Time series analysis, the TS should be stationary.
A time series is said to be stationary if its statistical properties
(mean, variance, autocorrelation) doesnt change by a large value
over a period of time.
Types of tests:
1. Augmented Dickey Fuller test (ADH Test)
2. Kwiatkowski Phillips Schnidt Shin (KPSS) test
3. Phillips Perron (PP) Test

Null Hypothesis: The time series is not stationary
Alternate Hypothesis: Time series is stationary
If p >0.05 we reject Null Hypothesis
”’
from statsmodels.tsa.stattools import adfuller
result = adfuller(air_passengers[‘#Passengers’])
print(“ADF Stats: \n,result[0])
print(“p value = “,result[1])
”’
To reject Null hypothesis, result[0] less than 5% critical region value
and p > 0.05
”’

# Run the model
”’
ARIMA model: Auto-Regressive Integrative Moving Average
AR: p predicts the current value
I: d integrative by removing trend and seasonality component from previous period
MA: q represents Moving Average

AIC- Akaike’s Information Criterion (AIC) – helps to find optimal p,d,q values
BIC – Bayesian Information Criterion (BIC) – alternative to AIC
”’
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(air_passengers[‘#Passengers’].diff().dropna())
plot_pacf(air_passengers[‘#Passengers’].diff().dropna())
plt.show()
”’
How to read above graph:
To find q (MA), we look at the Autocorrelation graph and see where there is a drastic change:
here, its at 1, so q = 1 (or 2 as at 2, it goes to -ve)

To find p (AR) – sharp drop in Partial Autocorrelation graph:
here, its at 1, so p = 1 (or 2 as at 2, it goes to -ve)

for d (I) – we need to try with multiple values
intially we will take as 1

”’
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(air_passengers[‘#Passengers’], order=(1,1,1))
result = model.fit()
plt.plot(air_passengers[‘#Passengers’])
plt.plot(result.fittedvalues)
plt.show()
print(“ARIMA Model Summary”)
print(result.summary())

model = ARIMA(air_passengers[‘#Passengers’], order=(4,1,4))
result = model.fit()
plt.plot(air_passengers[‘#Passengers’])
plt.plot(result.fittedvalues)
plt.show()
print(“ARIMA Model Summary”)
print(result.summary())

# Prediction using ARIMA model
air_passengers[‘Forecasted’] = result.predict(start=120,end=246)
air_passengers[[‘#Passengers’,‘Forecasted’]].plot()
plt.show()

# predict using SARIMAX Model
import statsmodels.api as sm
model = sm.tsa.statespace.SARIMAX(air_passengers[‘#Passengers’],order=(7,1,1), seasonal_order=(1,1,1,12))
result = model.fit()
air_passengers[‘Forecast_SARIMAX’] = result.predict(start=120,end=246)
air_passengers[[‘#Passengers’,‘Forecast_SARIMAX’]].plot()
plt.show()

https://drive.google.com/drive/folders/1Xe3HftLxL1T6HsEBUfjq_zXANjTnr6Cz?usp=drive_link

”’
NLP – Natural Language Processing – analysing review comment to understand
reasons for positive and negative ratings.
concepts like: unigram, bigram, trigram

Steps we generally perform with NLP data:
1. Convert into lowercase
2. decompose (non unicode to unicode)
3. removing accent: encode the content to ascii values
4. tokenization: will break sentence to words
5. Stop words: not important words for analysis
6. Lemmetization (done only on English words): convert the words into dictionary words
7. N-grams: set of one word (unigram), two words (bigram), three words (trigrams)
8. Plot the graph based on the number of occurrences and Evaluate
”’
”’
cardboard mousepad. Going worth price! Not bad
”’

link=“https://raw.githubusercontent.com/swapnilsaurav/OnlineRetail/master/order_reviews.csv”
import pandas as pd
import unicodedata
import nltk
import matplotlib.pyplot as plt
df = pd.read_csv(link)
print(list(df.columns))
”’
[‘review_id’, ‘order_id’, ‘review_score’, ‘review_comment_title’,
‘review_comment_message’, ‘review_creation_date’, ‘review_answer_timestamp’]
”’
df[‘review_creation_date’] = pd.to_datetime(df[‘review_creation_date’])

df[‘review_answer_timestamp’] = pd.to_datetime(df[‘review_answer_timestamp’])

# data preprocessing – making data ready for analysis
reviews_df = df[df[‘review_comment_message’].notnull()].copy()
#print(reviews_df)
”’
Write a function to perform basic preprocessing steps
”’
def basic_preprocessing(text):
txt_pp = text.lower()
print(txt_pp)
#remove accent

# applying basic preprocessing:
reviews_df[‘review_comment_message’] = \
reviews_df[‘review_comment_message’].apply(basic_preprocessing)