Data Mining & Knowledge Discovery

(Mineração de Dados e Descoberta de Conhecimento)

DAT003/CAIA003 - CPGEI & PPGCA

     

last update: 19/04/2024 14:22


General issues:


Preliminary scheduling for 2024-2 (subject to changes):

Week

Date

Lecturer

Subject

Lecture notes

Softwares/data

Support videos

1

june, 10th

Heitor / Thiago

Introduction: the data mining & knowledge discovery process. Presentation of real-world case-studies. Data Collection (webcrawling & webscrapping), dataset construction

    O que é data-mining (2'53"),  DatasetCreation (7'41"), WebscrappingXWebcrawling (3'13"),

2

june, 17th

Thiago/ Heitor

Types of data and their analysis. Data  visualization. Recommended tools.

 

 

software Orange   Orange básico (12'44"), Workflow com Orange (13'05"), Orange playlist, PowerBI (60'05")

3

june, 24th

Heitor

Classification task: Decision trees. Models, concepts and evaluation metrics. Decision rules.

    StatQuest: Árvores de Decisão (17'21"), Hunt algorithm (13'02")

4

july, 1st

Thiago

Classification task:  Bagging and boosting. Regression: linear regression

     

5

july, 8th

Heitor

Associative analysis task: frequent and infrequent pattern discovery

  vote.arff

new-fertility.arff

new-breadbasket.arff

 

From july, 10th to july, 24th --> winter school break (no classes)

6

august, 5th

Thiago

Clustering task: K-means, hierarchical clustering, cluster quality evaluation

    Algoritmo K-means, passo-a-passo (23'30"), Introdução ao Agrupamento Hierárquico (12'25")

7

august, 12th

Heitor

Feature selection, dimensionality reduction, Principal Components Analysis (PCA)

 

pima-indians.arff

 

 

8

august, 19th

Thiago Text mining, main tools and techniques    

 

 

9

august, 26th

Heitor

Image  mining

 

butterfly_vs_owl.zip

estilos_de_carros.zip

 

10

september, 2nd

Heitor

Time-series and signals and  mining

     

11

september, 9th

 Thiago/ Heitor Anomaly Detection. Preliminary discussion of ideas for the Final Project      

12

september, 16th Heitor / Thiago PROJECT PROPOSAL DUE: Short presentation and discussion of proposals (plans "A" and "B")for the final project. Including: objectives, dataset construction, methods, and analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students
 

october, 21st

Heitor / Thiago PROJECT REPORT DUE: Full report "paper-like" along with codes and data
 

november, 4th 2024

Heitor / Thiago ORAL PRESENTATION (1st round): Live seminar for presenting final project
 

november, 11th 2024

Heitor / Thiago ORAL PRESENTATION (2nd round): Live seminar for presenting final project
       

 


Homework:

HW#

Subject

Date due

Link

Datasets

Upload link

1

Dataset creation & simple data analysis        

 

 

All interactions will be done by Moodle

 

2

Classification task with decision trees      

3

Classification task with decision rules      

4

Associative Analysis      

5

Clustering Analysis      

6

Feature Selection & Dimensionality Reduction      

7

Image Mining      

8

Text Mining      

9

       
10      

 


Support materials and links: