Data Mining & Knowledge Discovery

(Mineração de Dados e Descoberta de Conhecimento)

DAT003/CAIA003 - CPGEI & PPGCA

     

last update: 01/04/2025 10:38


General issues:


Tentative scheduling for 2025-1st (subject to changes):

Week

Date

Lecturer(s)

Subject

Lecture notes

Softwares/data

Links and videos

1

march, 11th

(in-person class)

Heitor / Thiago

Introduction: the data mining & knowledge discovery process,  Webscrapping, Data, Recommended tools

1a-Introduction

1b-Webscrapping

1c-Data

1d-Software Orange

orange-autoMPG.ows

orange-iris.ows

O que é data-mining (2'53"), DatasetCreation (7'41"), WebscrappingXWebcrawling (3'13"),

Orange datamining software (download)

Orange playlist (english)

Orange básico (12'44"), Workflow com Orange (13'05")

2

march, 18th (online class)

Heitor

Classification task: Decision trees/rules. Models, concepts and evaluation metrics.

 

2a-Decision Trees

 

2b-Decision Rules

hw1-2025.pdf

soybean.csv

contraceptive.csv

StatQuest: Árvores de Decisão (17'21"), Hunt algorithm (13'02")

3

march, 25th (online class)

Thiago

Classification task:  KNN, Neural Networks, Bagging and boosting. Regression task: linear and polynomial regression

     

4

april, 1th (online class)

Heitor

Associative analysis task: frequent and infrequent pattern discovery

4-Associative Analysis hw3-2025.pdf

very-low-weight-babies.csv

road_accidents_PR.xlsx

 

5

april, 8th (online class)

Thiago

Clustering task: K-means, hierarchical clustering, cluster quality evaluation

    Algoritmo K-means, passo-a-passo (23'30"), Introdução ao Agrupamento Hierárquico (12'25")

6

april, 15th (online class)

Heitor Feature selection task: dimensionality reduction, Principal Components Analysis (PCA)      

7

april, 22nd (online class)

Thiago Text mining: main tools and techniques    

 

8

april, 29th (online class)

Heitor

Signals and time-series mining.

Preliminary discussion of ideas for the Final Project

   

 

 

9

may, 6th (online class)

Heitor/ Thiago

Image  mining, Anomaly Detection

     

10

may, 13th (in-person class)

Heitor / Thiago PROJECT PROPOSAL DUE: Short presentation and discussion of proposals (plans "A" and "B") for the final project. Including: objectives, dataset construction, methods, and analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students

11

may, 20th (online class) Heitor / Thiago  Resubmission of rejected proposals: short presentation of improved versions of the previously rejected projects (as suggested by the professors) or, eventually, new projects

 

     
 

june, 17th

Heitor / Thiago PROJECT REPORT DUE: Full report ("paper-like") of the project, along with codes and data
 

june, 24th

(in-person class)

Heitor / Thiago ORAL PRESENTATION: Live seminar for presenting final project

 


Homework:

HW#

Subject

Date due

Link

Datasets

Homework & reports

1

Classification task with decision trees        

 

 

Homeworks should be delivered using Moodle

 

2

Classification task with decision rules      

3

Associative Analysis      

4

Clustering Analysis      

5

Feature Selection & Dimensionality Reduction      

6

Text Mining      

7

Image Mining      

8

Time-Series Mining      

9

       
10        

 


Additional support materials and links: