Data Mining & Knowledge Discovery

(Mineração de Dados)

DAT003/CAIA003 - CPGEI & PPGCA

     

last update: 14/11/2019 08:36


General issues:


Scheduling (preliminary):

Week

Date

Subject

Lecture notes

Support materials

1

sep, 26th

Introduction: data mining & knowledge discovery. Presentation of real-world case-studies

class1a class1b class1c class1d  

2

oct, 3rd

Types of data and their analysis. Data warehousing. Data collection (webcrawling & webscrapping) and dataset construction. Data visualization, PowerBI

class2a

class2b  class2d

dataset iris                     software Orange    dataset Dados-desnormalizados.xlsx    dataset Horarios.xlsx exemplo-SandDance Tutorial-ETL-PowerBI Youtube tutorials: 1, 2, 3

3

oct, 10th

Classification task: Decision trees. Models, concepts and evaluation metrics.

class3a class3b dataset eucalyptus   software Weka

4

oct, 17th

Classification task: Decision rules. Bagging and boosting

class4a  

5

oct, 24th

Associative analysis

class5a dataset vote                   dataset fertility

6

oct, 31st

Clustering task: K-means, hierarchical clustering, cluster quality

class7a dataset seeds  

7

nov, 7th

Big data. Deep learning

   

8

nov, 14th

Feature selection, dimensionality reduction, Principal Components Analysis (PCA) class8a  

9

nov, 21st

Multimidia mining

class9a class9b dataset owl x butterfly

10

nov, 28th

Text mining

class10a class10b dataset Spam_Mails StopIngl StopPort

11

dec, 5th

Anomaly detection. Regression    

12

dec, 12th

PROJECT PROPOSAL DUE: Short presentation and discussion of proposals for the final project. Including: objective, dataset construction, methods, analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students    
 

2020 feb, 15th

PROJECT REPORT DUE: Full report "paper-like" along with codes and data    
 

1st academic week/2020

ORAL PRESENTATION: To be scheduled for the first academic week of 2020    

 


Homework:

HW#

Subject

Date due

Link

Datasets

Upload link

1

Data visualization - PowerBI

oct, 10th hw1    

2

Decision trees

oct, 17th hw2 hepatitis soybean  

3

Classification rules

oct, 24th hw3 communities contraceptive  

4

Associative analysis

oct, 31st hw4 vlbw cafeteria  

5

 

nov, 7th hw5    

6

Cluster analysis

nov, 14th hw6 experimento dentition  

7

Feature selection, dimensionality reduction, PCA

nov, 21th hw7 gene-drug-test phoneme  

8

Image mining

nov, 28th hw8 dataset hw8  

9

Text mining

dec, 5th hw9 stopwords_eng.txt  

10

         

 

  #1 #2 #3 #4 #5 #6 #7 #8 #9
  Data Visualization Decision Trees Classification Rules     Cluster Analysis Feature Selection    
Vinicius Couto Tasso                  
Jose Paulo Pereira Das Dores Savioli                  
Rodrigo Trevisani Juchen                  
Leonardo Schneider                  

 


Support materials: