Title | ||
---|---|---|
Analysis of Data Preprocessing Increasing the Oversampling Ratio for Extremely Imbalanced Big Data Classification |
Abstract | ||
---|---|---|
The \"big data\" term has caught the attention of experts in the context of learning from data. This term is used to describe the exponential growth and availability of data (structured and unstructured). The design of effective models that can process and extract useful knowledge from these data represents a immense challenge. Focusing on classification problems, many real-world applications present a class distribution where one or more classes are represented by a large number of examples with respect to the negligible number of examples of other classes, which are precisely those of primary interest. This circumstance is known as the problem of classification with imbalanced datasets. In this work, we analyze a hypothesis in order to increment the accuracy of the underrepresented class when dealing with extremely imbalanced big data problems under the MapReduce framework. The performance of our solution has been analyzed in an experimental study that is carried out over the extremely imbalanced big data problem that was used in the ECBDL'14 Big Data Competition. The results obtained show that is necessary to find a balance between the classes in order to obtain the highest precision. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/Trustcom-BigDataSe-ISPA.2015.579 | TrustCom/BigDataSE/ISPA |
Keywords | DocType | Volume |
Big data,Hadoop,MapReduce,Imbalance classification,Preprocessing | Conference | 2 |
ISSN | Citations | PageRank |
2324-9013 | 6 | 0.48 |
References | Authors | |
13 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
S. del Río | 1 | 243 | 8.92 |
José Manuel Benítez | 2 | 888 | 56.02 |
Francisco Herrera | 3 | 27391 | 1168.49 |