Spark SQL is Apache Spark’s module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. This document provides a list of Data Definition and Data Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
THIS IS THE 4th and the last part of the complete Data Science with Python course started 3 months ago!!
#NLP – Natural Language processing:
# sentiments: Positive, Neutral, Negative
#
”’
we will use nltk library for NLP:
pip install nltk
”’
import nltk
#1. Convert into lowercase
text = “Product is great but
I amn’t liking the colors as they are worst”
text = text.lower()
”’
2. Tokenize the content: break it into words or sentences
”’
text1 = text.split()
#using nltk
from nltk.tokenize import
sent_tokenize,word_tokenize
text = word_tokenize(text)
#print(“Text =\n”,text)
#print(“Text =\n”,text1)
”’
3. Removing Stop words: Words which are not significant
for your analysis. E.g. an, a, the, is, are
”’
my_stopwords = [‘is’,‘i’,‘the’]
text1 = text
for w in text1:
if w in my_stopwords:
text.remove(w)
print(“Text
after my stopwords:”,text1)
nltk.download(“stopwords”)
from nltk.corpus import
stopwords
nltk_eng_stopwords = set(stopwords.words(“english”))
#print(“NLTK list of stop words in English:
“,nltk_eng_stopwords)
”’
Just for example: we see the word but in the STOP WORDS but
we want to include it, then we need to remove the word from the set
”’
# removing but from the NLTK stop words
nltk_eng_stopwords.remove(‘but’)
for w in text:
if w in nltk_eng_stopwords:
text.remove(w)
print(“Text
after NLTK stopwords:”,text)
”’
4. Stemming: changing the word to its root
eg: {help: [help, helped, helping, helper]}
One of the method is Porter stemmer
”’
from nltk.stem import
PorterStemmer
stemmer = PorterStemmer()
text = [stemmer.stem(w) for w in text]
”’ above line is like below:
t_list=[]
for w in text:
a = stemmer.stem(w)
t_list.append(a)
”’
print(“Text
after Stemming:”,text)
”’
5. Part of Speech Tagging (POS Tagging)
grammatical word which deals with the roles they place
like – 8 parts of speeches – noun, verb, …
Reference: https://www.educba.com/nltk-pos-tag/
POS Tagging will give Tags like
CC: It is the conjunction of coordinating
CD: It is a digit of cardinal
DT: It is the determiner
EX: Existential
FW: It is a foreign word
IN: Preposition and conjunction
JJ: Adjective
JJR and JJS: Adjective and superlative
LS: List marker
MD: Modal
NN: Singular noun
NNS, NNP, NNPS: Proper and plural noun
PDT: Predeterminer
WRB: Adverb of wh
WP$: Possessive wh
WP: Pronoun of wh
WDT: Determiner of wp
VBZ: Verb
VBP, VBN, VBG, VBD, VB: Forms of verbs
UH: Interjection
TO: To go
RP: Particle
RBS, RB, RBR: Adverb
PRP, PRP$: Pronoun personal and professional
But to perform this, we need to download any one tagger:
e.g. averaged_perceptron_tagger
nltk.download(‘averaged_perceptron_tagger’)
”’
nltk.download(‘averaged_perceptron_tagger’)
import nltk
from nltk.tag import
DefaultTagger
py_tag = DefaultTagger (‘NN’)
tag_eg1 = py_tag.tag ([‘Example’, ‘tag’])
print(tag_eg1)
#txt = “Example of nltk pos tag list”
#txt = [‘product’, ‘great’, ‘but’, “not”, ‘like’, ‘color’]
#txt = word_tokenize(txt)
#txt = [‘Example’,’of’,’nltk’,’pos’,’tag’,’list’]
pos_txt = nltk.pos_tag(text)
print(“POS
Tagging:”, pos_txt)
”’
6. Lemmetising
takes a word to its core meaning
We need to download: wordnet
”’
nltk.download(‘wordnet’)
from nltk.stem import
WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print(“Very
good = “,lemmatizer.lemmatize(“very good”))
print(“Halves
= “,lemmatizer.lemmatize(“halves”))
text = “Product is great but I amn’t liking the colors
as they are worst”
text = word_tokenize(text)
text = [lemmatizer.lemmatize(w) for w in text]
print(“Text
after Lemmatizer: “,text)
This page is a live document and will be updated with content as we make progress
=====================
# Exceptions
a=“k”
b=10
c=-1
try:
c = b/d
except ZeroDivisionError:
print(“Denominator is zero hence stopping the program from executing”)
except TypeError:
print(“Invalid numbers, hence exiting…”)
except NameError:
print(“One of the values has not been defined. Try again”)
except Exception:
print(“Not sure but some error has occurred, we need to stop”)
else:
print(“Answer is”,c)
finally:
print(“We have completed division process”)
#
class InvalidLength(Exception):
def __init__(self,value=0):
self.value = value
length, breadth = –1,-1
while True:
try:
length = int(input(“Enter length: “))
except ValueError:
print(“Invalid number, try again…”)
else:
#assert length > 0, “Rectangle with this diamension is not possible”
if length <=0:
try:
raise InvalidLength
except InvalidLength:
print(“Invalid value for Length hence resetting the value to 1”)
length=1
break
while True:
try:
breadth = int(input(“Enter breadth: “))
except ValueError:
print(“Invalid number, try again…”)
else:
assert breadth>0,“Rectangle with this diamension is not possible”
break
area = length * breadth
print(“Area of the rectangle is”,area)
## ### ##
# datetime, date, time
from datetime import datetime, timedelta
import time
from pytz import timezone
curr_time = datetime.now()
print(“Current time is”,curr_time)
print(“Current time is”,curr_time.strftime(“%d / %m /%Y”))
print(curr_time.year, curr_time.day, curr_time.date())
for i in range(5):
time.sleep(1) # sleep for 2 seconds
print(“Time left:”,5-i,“seconds”)
print(“Good Morning”)
print(“Current time is”,datetime.now())
print(“Date 2 days back was”,(curr_time-timedelta(days=2)).strftime(“%d/%m/%Y”))
print(“UTC Time is”,datetime.now(timezone(‘UTC’)))
print(“UTC Time is”,datetime.now(timezone(‘US/Eastern’)))
print(“UTC Time is”,datetime.now(timezone(‘Asia/Kolkata’)))
Download MYSQL database link:
https://dev.mysql.com/downloads/installer/
Create table employees.Employees( EMPID INT Primary Key auto_increment, FNAME VARCHAR(55) NOT NULL, LNAME VARCHAR(55), DOB DATE, EMAIL VARCHAR(35) unique, PHONE VARCHAR(11), DOJ DATE Default(‘2021-07-20’), — YYYY-MM-DD SALARY FLOAT(2), DEPTID INT, Foreign Key (DEPTID) References Departments(DID), Constraint U_UC_LN_DOB Unique(LNAME,DOB), CHECK(Salary 0.0) ) — Constraints: Primary Key, Foreign Key, Not Null, Unique, Check, Default
— Modifying a table – Table is already created and in use – ALTER TABLE
— ADD or DELETE (DROP) or MODIFY or RENAME a Column
— DDL command to delete is DROP
— DDL command to modify is ALTER
use employees;
ALTER table employees ADD BONUS Float(3);
ALTER table employees ADD dummy Float(3);
ALTER TABLE Employees DROP COLUMN dummy;
ALTER TABLE EMPLOYEES MODIFY COLUMN BONUS float(4)
ALTER TABLE EMPLOYEES RENAME COLUMN BONUS to BONUS_PCT;
use employees;
— create table, drop table, alter table
— to read data we use Select
Select * from departments;
insert into departments (DID, HOD, DNAME, DCODE) values (100,’Sachin’,’CSE’,’AA00′);
insert into departments values (101,’MEC’,’Virat’,’AZ00′);
insert into departments values (103,’ECE’,’Rohit’,’KZ00′);
insert into departments values (105,’CIV’,’Dhoni’,’CZ00′);
insert into departments values (106,’TCE’,’Sunil’,’BZ00′);
Select * from employees;
Select @@GLOBAL.secure_file_priv;
— Import data into CSV
LOAD DATA INFILE ‘C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/EmployeesData.csv’
INTO TABLE EMPLOYEES
FIELDS TERMINATED BY ‘,’
ENCLOSED BY ‘”‘
LINES TERMINATED BY ‘\n’
IGNORE 1 ROWS;
— DELETE to remove a row
DELETE FROM EMPLOYEES WHERE EMPID=120;
— Update to modify the existing values in a row
Update Employees
Set Bonus_PCT = 0.45, Salary = 160000
where empid=122;
Select * from employees where empid=122
— SELECT
Select FNAME, EMAIL, PHONE, SALARY From Employees;
Select FNAME, EMAIL, PHONE, SALARY From Employees where salary >=100000;
Select FNAME, EMAIL, PHONE, SALARY From Employees where EMAIL =’ramakrishnavendra@wnbco.co’;
— Create Index to make query faster
Create Index idx_salary
on Employees (Salary)
— Relational operators in MYSQL: = > < >= <= <>
Select FNAME, EMAIL, PHONE, SALARY From Employees where EMAIL <>’ramakrishnavendra@wnbco.co’;
— BETWEEN LIKE IN
use employees;
select * from employees;
select FNAME, LNAME, DOB, Salary from employees where salary > 75000;
— BETWEEN is used for numbers
select FNAME, LNAME, DOB, Salary from employees where salary BETWEEN 75000 AND 95000;
— below statement is same as above (BETWEEN)
select FNAME, LNAME, DOB, Salary from employees where salary >= 75000 AND salary <= 95000;
select FNAME, LNAME, DOB, Salary from employees where salary >= 75000 AND Bonus_pct <= 0.4; — 0 to 9 rows
select FNAME, LNAME, DOB, Salary from employees where salary >= 75000 OR Bonus_pct <= 0.4; — 25 to 34 rows
select FNAME, LNAME, DOB, Salary from employees where salary >= 75000 — 25 rows
select FNAME, LNAME, DOB, Salary from employees where Bonus_pct <= 0.4; — 9 rows
— LIKE – for text data comparison
Select * from employees where FNAME like ‘A%’
— IN – checking for given set of values
Select * from employees where FNAME IN (‘O%’, ‘A%’,’B%’);
Select * from employees where FNAME IN (‘Anikeet Dey’,’Amol Jain’,’Suresh Narayan Singh Yadav’ );
Select * from employees where SALARY IN (36107, 110266, 107799, 198890);
— Logical operator – AND OR NOT
select * from employees where not( deptid = 100 and salary >100000)
— ORDER BY
select * from employees order by salary DESC;
select * from employees order by salary ASC; — Ascending order is by default
select * from employees order by DEPTID ASC, Salary DESC;
use employees;
select * from employees where bonus_pct is not null;
select * from employees order by FNAME;
select * from employees order by FNAME DESC;
UPDATE employees SET BONUS_PCT = NULL WHERE BONUS_PCT > 0.88;
select * from employees where bonus_pct is not null limit 3;
select * from employees where bonus_pct is not null order by salary DESC limit 5;
select avg(salary) as Max_Salary, deptid from employees group by deptid;
select avg(salary), count(salary), sum(salary) from employees;
select * from employees where salary > (select avg(salary) from employees);
— Find highest salary earner for each department
select deptid, MAX(salary) as max_salary FROM employees group by deptid;
select deptid, fname, salary from employees where (Deptid, salary) in
(select deptid, MAX(salary) as max_salary FROM employees group by deptid);
use employees;
select deptid, avg(salary) as “Average Salary” from employees group by deptid;
— COUNT, AVG, SUM, MIN, MAX
— Like: %a, a%, %a% , %b__% (exactly 1 value for _), ‘a%z’
select fname, email, salary from employees;
select fname, email, salary from employees where fname like ‘%ol%’;
— J followed by exactly three characters
select fname, email, salary from employees where fname like ‘%J___’;
— J followed by atleast four characters
select fname, email, salary from employees where fname like ‘%J____%’;
select * from employees where fname in (‘Pankaj Prajapati’,’Manju Mishra’ ,’Arijeet Dasgupta’);
select * from employees where fname in (select fname from employees where fname like ‘%J____%’);
select * from employees where (fname,salary) in (select fname,salary from employees where fname like ‘%J____%’);
select fname, email, salary from employees where fname like ‘%dey’;
— combining data from 2 tables
use employees;
select * from employees;
select employees.FNAME, DID, departments.DNAME from employees, departments
select employees.FNAME, DID, departments.DNAME from employees, departments where employees.deptid = departments.did;
insert into departments values (107, ‘BIOT’,’Kapil’,’BB09′)
insert into employees (EMPID, FNAME, LNAME, DOB, EMAIL,PHONE, DOJ,SALARY)
values (133,’Sachin Tendulkar’,’T’,’1990-05-04′,’sachin@wnbco.co’,9999000009,’2023-05-15′,231456);
select * from departments;
select employees.FNAME, DID, departments.DNAME from employees, departments where employees.deptid = departments.did;
— JOINS
use employees;
select * from employees;
select * from departments;
select fname, hod from employees, departments where departments.did = employees.DEPTID; — Inner
select fname, hod from employees, departments where employees.DEPTID = departments.did order by empid;
select fname, hod from employees INNER JOIN departments ON employees.DEPTID = departments.did order by empid;
SELECT T1.COL1, T2.COL2, T3.COL3
FROM (((T1 INNER JOIN T2 ON T1.K1 = T2.K1) INNER JOIN T3 ON T2.K2 = T3.K2) INNER JOIN T4 ON T3.K3 = T4.K3);
select fname, hod from employees LEFT JOIN departments ON employees.DEPTID = departments.did order by empid;
use employees;
select fname, hod from employees left join departments on employees.DEPTID = departments.did
UNION
select fname, hod from departments left join employees on employees.DEPTID = departments.did;
select fname, hod from departments inner join employees on employees.DEPTID = departments.did;
select * from employees;
select sum(EMPID) from employees;
select count(EMPID) from employees;
select count(EMPID) from employees where salary > 50000;
select count(EMPID) , DEPTID from employees
group by deptid
having count(EMPID) > 5;
select count(fname), hod from employees left join departments on employees.DEPTID = departments.did
group by deptid
having count(fname) > 3;
select * from employees;
— Salary_Grade: <50K: E 50-100K: D 100-150K – C 150-200-B >200K: A
select fname, salary, bonus_pct,
case when bonus_pct is null then salary
else salary+salary*bonus_pct
end as total_salary from employees;
select fname, salary,
CASE
when salary < 50000 Then ‘Grade: E’
when salary > 200000 Then ‘Grade: A’
when salary > 150000 Then ‘Grade: B’
when salary > 100000 Then ‘Grade: C’
Else ‘Grade: D’
End As Salary_Grade
from employees;
use employees;
select * from employees;
select FNAME, DOB, DOJ, datediff(doj,dob) from employees;
select fname, NOW() from employees;
select now(); — current date and time (system)
select date(now());
select curdate();
— default date format in MYSQL is YYYY-MM-DD
— DD-MM-YYYY
select date_format(curdate(), ‘%d-%m-%Y’) as Date;
select curdate() as todays_date, date_add(curdate(),interval 1 Day) as Tomorrow,
date_add(curdate(),interval 1 Week) as next_week,
date_add(curdate(),interval 1 Month) as Next_month,
date_add(curdate(),interval 1 Year) as Next_year;
— Add date: Interval, day, week or month or year;
select curdate() as todays_date, date_sub(curdate(),interval 1 Day) as Yesterday,
date_sub(curdate(),interval 1 Week) as last_week,
date_sub(curdate(),interval 1 Month) as last_month,
date_sub(curdate(),interval 1 Year) as last_year;
select day(curdate()) as day, month(curdate()) as Month, quarter(curdate()) as Quarter, Year(curdate()) as Year,
Weekday(curdate()) as Weekday, week(curdate()) as week, weekofyear(curdate()) as WeekofYear;
select * from employees where salary > (select salary from employees where empid = 104);
— create a stored procedure
DELIMITER $$
Create Procedure GetEmpInfo()
Begin
select * from employees where salary > (select salary from employees where empid = 104);
End $$
DELIMITER ;
call GetEmpInfo();
drop Procedure GetEmpInfo;
— Features: exception handling, conditions and loops (While, Repeat),
— Creating a View
create view EmpData
as
Select EMPID, FNAME, LNAME, DOB, EMAIL, PHONE, DOJ, DEPTID from Employees;
select * from empdata;
— String functions: LTRIM, RTRIM, Replace, substring, COncat
select concat(fname,’.’,lname,’@wnbco.co’) from employees;
select fname, substring(fname, 3), substring(fname, -3) from employees;
import pymysql
con_str = pymysql.connect(host=“localhost”, user=“root”, password=“learnSQL”,database=“employees”)
cursor = con_str.cursor()
q1 = ”’Create Table PY_EMP (
EMPID INT PRIMARY KEY,
NAME VARCHAR(30),
CITY VARCHAR(15))
”’
#cursor.execute(q1)
q2 = ”’INSERT INTO PY_EMP VALUES(1,’Laxman’,’Hyderabad’)”’
cursor.execute(q2)
q2 = ”’INSERT INTO PY_EMP VALUES(2,’Rahul’,’Bangalore’)”’
cursor.execute(q2)
q3 = ”’Select * from Py_Emp”’
cursor.execute(q3)
results = cursor.fetchall()
for row in results:
print(row)
con_str.commit()
con_str.close()
Write detailed steps for making tea and draw flowchart.
Flowchart and C Programs by Swapnil Saurav on Scribd
Pseudocode Book by Swapnil Saurav on Scribd
Calculate NetPay when:
Avg > 70: Grade B
Avg > 60: Grade C
Avg > 40: Grade D
Avg < 40: Grade E
There are numerous scientists who have made significant contributions to the field of machine learning. Some of the most notable figures include:
These are just a few examples of the numerous scientists who have made notable contributions to the field of machine learning. Many others have also played crucial roles in advancing the field and continue to do so.