Data Mining & Knowledge Discovery

(Mineração de Dados e Descoberta de Conhecimento)

DAT003/CAIA003 - CPGEI & PPGCA

     

last update: 09/10/2023 23:38


General issues:


Tentative scheduling for 2023-3 (subject to changes):

Week

Date

Lecturer

Subject

Lecture notes

Softwares/data

Support videos

1

october, 10th

Heitor / Thiago

Introduction: the data mining & knowledge discovery process. Presentation of real-world case-studies. Data Collection (webcrawling & webscrapping), dataset construction

1a-Intro

1b-examples-T

1c-Examples-H

1d-Data-Collection

 

  O que é data-mining (2'53"),  DatasetCreation (7'41"), WebscrappingXWebcrawling (3'13"),

2

october, 17th

Thiago/ Heitor

Types of data and their analysis. Data  visualization. Recommended tools.

 

 

software Orange   Orange básico (12'44"), Workflow com Orange (13'05"), Orange playlist, PowerBI (60'05")

3

october, 24th

Heitor

Classification task: Decision trees. Models, concepts and evaluation metrics.

    StatQuest: Árvores de Decisão (17'21"), Hunt algorithm (13'02")

4

october, 31st

Thiago

Classification task: Decision rules. Bagging and boosting. Regression: linear regression

     

5

november, 7th

Heitor

Associative analysis task: frequent and infrequent pattern discovery

  vote.arff

new-fertility.arff

new-breadbasket.arff

 

6

november, 14th

Thiago

Clustering task: K-means, hierarchical clustering, cluster quality

    Algoritmo K-means, passo-a-passo (23'30"), Introdução ao Agrupamento Hierárquico (12'25")

7

november, 21st

Heitor

Feature selection, dimensionality reduction, Principal Components Analysis (PCA)

 

pima-indians.arff

 

 

8

november, 28th

Thiago Text mining, main tools and techniques    

 

 

9

december, 5th

Heitor

Images, signals and time-series mining

 

butterfly_vs_owl.zip

estilos_de_carros.zip

Classificação de imagens com Orange

10

december, 12th

Thiago/ Heitor

Anomaly Detection. Short presentation and discussion of  ideas for the Final Project

     

11

december, 19th

Heitor / Thiago PROJECT PROPOSAL DUE: Presentation and discussion of proposals for the final project. Including: objective, dataset construction, methods, analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students

 

           
 

february, 20th 2024

Heitor / Thiago PROJECT REPORT DUE: Full report "paper-like" along with codes and data
 

february, 27th 2024

Heitor / Thiago ORAL PRESENTATION: Live seminar for presenting results final project

 


Homework:

HW#

Subject

Date due

Link

Datasets

Upload link

1

Dataset creation & simple data analysis        

 

 

Moodle

 

2

Classification task with decision trees      

3

Classification task with decision rules      

4

Associative Analysis      

5

Clustering Analysis      

6

Feature Selection & Dimensionality Reduction      

7

Image Mining      

8

Text Mining      

9

       
10      

 


Support materials and links: