2 Types of methods : supervised methods vs non supervised methods Problems that require data : supervised problems
Regression example : predict reasonable price for a used car predict price for real estate in the city
meaning what we want to predict is continous
Classification : discreet, only so many values we could choose
- news paper article : predict what topic it is, news, sports,
- determine disease : either a patient has covid, or they dont
it is possible to cast regression problems to classification problems either a used is interested in a movie, or they are not At the End, either yes or no, or : politics, sports, etc. methods under the hood predict probability values
Clustering : Given Data, group them, make sense of them, Data points in a cluster have a high coherence, DP in different clusters are not coherent
After Clustering : goal is to have a typical Classification of the data points Grouping based on contents of the data
Neural Networks :
back to 1940s, originally inspired by human brain of mammals works People didnt understand how to train neural networks
~2010 Neural Networks got a revival training on GPUs is much better than on CPUs LLMs are based on a special form of neural networks : transformers
we can learn from Models : a car that has been driven a million miles has a worse condition than cars with fewer miles
In the last 20 years Machine Learning has become common, before companies didnt record data, nowadays there is much more data to be used
Recommend Systems, search, etc. are being driven by Machine Learning
there is structured data, and semi-structured data we will focus on structured data
rows : datapoints columns : features
there are also un-structured and semi-structured data
un-structured : sequence of characters, example : E-Mail semi-structured : content doesnt have a fixed structure
many methods rely on structured data
we are not interested in data types, but types of features german : ” skalen niveau ” Essentially : what operations can we apply to them
1 Nominal Features ( =, != )
we can compare two values → is it the same value ?
e.g gender of person, color of car
what is the most frequent value ?
We can not compute anything with it `
does it make sense to compute a mean ?
2 Ordinal Features ( < , > )
We can do everything we can do with nominal features and more
There is a natural order in features :
e.g customer satisfaction level, energy class of car
A - customer is really satisfied B - customer is okay with it C…
3 Numerical Features ( +, -, * , / )
anything we can do with nominal and ordinal feaetures Numerical Features allow arithmetic operations :
e.g income of household, number of emissions,
We can : compute difference, mean, variance,
EstateType : Nominal Feature EnergyClass : Ordinal Features MonthlyRent, Area, DistanceToCenter : Numerical Features
Supervised vs Unsupervised Learning
Supervised methods assume that training data is available → target feature Unsuoervised Learning : task is not to structure data, but find out what DP do not look like other data points which temperature points do not make sense → temp sensor recording 2000º
Classification
Decision Trees being used to determine e.g a Topic of an article We can see why a document was classified as a sports, politics document
Neural Networks
backbone of artifical intelligence we learn to approximate arbitrary functions any function that evaluates to true / false
Supervised Methods
Require Pior Data to learn from, and be able to predict a target feature
Unsupervised Methods
Don’t require previous Data, Try to make sense out of the Data itself, e.g group DataPoints, or locate Outliers
Semi-supervised Learning
Takes both labeled Data (for which the target feaeture is known) And unlabeled Data (for which target is unknown)
Reinforcement Learning
Learns to choose suitable Actions to be able to maximize a Reward Function
Remember :
Types of Features : Nominal, Ordinal, Numerical Features …
Chapter 2 : Regression
Aims at predicting a numerical Target Feature
One ore more features, we want to predict another feature, they are numerical we need to make a model assumption
x-axis feature is given y-axis feature we want to predict
for hte training data, we know the correct value for y
we identify a plane, find a plane that best approximates data points how can we fit more complex data points ?
Chapter 3 : Classification
Aims at prediciting a nominal target feature based on one or multiple numerical features
Example applications:
-
classify an incoming e-mail as spam or not-spam
-
determine sentiment (e.g., negative, neutral, positive) of a customer review for a specific product
Chapter 4 : Clustering
Aims at revealing the structure of data by grouping data points
Similiar Points are grouped, dissimilar ones are separated
Example applications:
-
identify segments of customers based on their purchase histories (e.g., for marketing campaigns)
-
group newspaper articles according to topic (e.g., to build a portal organized by topics)
Neural Networks
Are the BackBone of Modern Aritifial Intelligence
Drive Image recognition, Large Language models
Summary
Machine learning aims at gaining insights from data and making predictions based on it
-
Nominal, ordinal, and numerical features differ in the operations that can be applied to them
-
Regression and classification methods are supervised and predict a numerical and nominal target feature, respectively based on other (numerical) features
-
Clustering methods are unsupervised and group similar data points together