Data Mining & Knowledge Discovery

(Mineração de Dados)

DAT003/CAIA003 - CPGEI & PPGCA

     

last update: 02/12/2019 17:22


General issues:


Scheduling (preliminary):

Week

Date

Subject

Lecture notes

Support materials

1

sep, 26th

Introduction: data mining & knowledge discovery. Presentation of real-world case-studies

class1a class1b class1c class1d  

2

oct, 3rd

Types of data and their analysis. Data warehousing. Data collection (webcrawling & webscrapping) and dataset construction. Data visualization, PowerBI

class2a

class2b  class2d

dataset iris                     software Orange    dataset Dados-desnormalizados.xlsx    dataset Horarios.xlsx exemplo-SandDance Tutorial-ETL-PowerBI Youtube tutorials: 1, 2, 3

3

oct, 10th

Classification task: Decision trees. Models, concepts and evaluation metrics.

class3a class3b dataset eucalyptus   software Weka

4

oct, 17th

Classification task: Decision rules. Bagging and boosting

class4a  

5

oct, 24th

Associative analysis

class5a dataset vote                   dataset fertility

6

oct, 31st

Clustering task: K-means, hierarchical clustering, cluster quality

class7a dataset seeds  

7

nov, 7th

Big data. Deep learning

   

8

nov, 14th

Feature selection, dimensionality reduction, Principal Components Analysis (PCA) class8a  

9

nov, 21st

Multimidia mining

class9a dataset owl x butterfly

10

nov, 28th

Text mining

class10a class10b dataset Spam_Mails StopIngl StopPort

11

dec, 5th

PROJECT PROPOSAL DUE: Short presentation and discussion of proposals for the final project. Including: objective, dataset construction, methods, analysis. Proposals will be analyzed and approval or resubmission will be communicated by e-mail to the students    

12

dec, 12th

 spare    
 

2020 feb, 15th

PROJECT REPORT DUE: Full report "paper-like" along with codes and data    
 

1st academic week/2020

ORAL PRESENTATION: To be scheduled for the first academic week of 2020    

 


Homework:

HW#

Subject

Date due

Link

Datasets

Upload link

1

Data visualization - PowerBI

oct, 10th hw1    

2

Decision trees

oct, 17th hw2 hepatitis soybean  

3

Classification rules

oct, 24th hw3 communities contraceptive  

4

Associative analysis

oct, 31st hw4 vlbw cafeteria  

5

 

nov, 7th hw5    

6

Cluster analysis

nov, 14th hw6 experimento dentition  

7

Feature selection, dimensionality reduction, PCA

nov, 21th hw7 gene-drug-test phoneme  

8

Image mining

nov, 28th hw8 dataset hw8  

9

Text mining

dec, 5th hw9 stopwords_eng.txt  

10

         

 

  #1 #2 #3 #4 #5 #6 #7 #8 #9
  Data Visualization Decision Trees Classification Rules   Big Data Cluster Analysis Feature Selection Image Mining Text Mining
Vinicius Couto Tasso                  
Jose Paulo Pereira Das Dores Savioli                  
Rodrigo Trevisani Juchen                  
Leonardo Schneider                  

 


Support materials: