Tuesday, October 21, 2008

Boosting untuk Imbalance

Saat ini saya sedang mengikuti sidang TA Arie Yanuar ttg Boosting untuk Imbalance. Beberapa coretan:
• Apakah data buatan tsb, sdh mencerminkan kasus riil. Bagaimana sebuah data riil dimapkan to the synthetic data.
• Kritisi the synthetic data.
• How about the other factors, such as: the type of attribute. (eg. Categorical vs numeric).
• What are the difficulty faced by imbalance data? (what is imbalance problem mean)
• The evaluation measure: just only for minority data. It is better for majority data as well.
• The next research for data: include the cost.
• Make an writing about the generic handling for imbalance on boosting methods. And make conclusion about what kind f approach better for imbalance problem.
• Survey paper and tutorial slide: boosting for imbalance problem.  submit for conference.
• What new in the TA? Noise dataset.
• Does boosting approach suitable (from theoretical view) for imbalance problem?
• No sampling?
• How about the splitting for training and testing data? What is the training and the testing data.
• The type of data of training and testing is the same?
• The number of iteration: Arie choosed 10.
• How boosting methods handle imbalance problem? Boosting designed for imbalance are better than boosting not designed for imbalance?