Skip to Main content Skip to Navigation
Conference papers

Hierarchical Data Topology Based Selection for Large Scale Learning

Abstract : The amount of available data for data mining and knowledge discovery continues to grow very fast with the era of Big Data. Genetic Programming algorithms (GP), that are efficient machine learning techniques, are face up to a new challenge that is to deal with the mass of the provided data. Active Sampling, already used for Active Learning, might be a good solution to improve the Evolutionary Algorithms (EA) training from very big data sets. This paper investigates the adaptation of Topology Based Selection (TBS) to face massive learning datasets by means of Hierarchical Sampling. We propose to combine the Random Subset Selection (RSS) with the TBS to create the RSS-TBS method. Two variants are implemented and applied to solve the KDD intrusion detection problem. They are compared to the original RSS and TBS techniques. The experimental results show that the important computational cost generated by original TBS when applied to large datasets can be lightened with the Hierarchical Sampling.
Document type :
Conference papers
Complete list of metadata

https://hal.parisnanterre.fr//hal-02286148
Contributor : Sana Ben Hamida Connect in order to contact the contributor
Submitted on : Saturday, November 27, 2021 - 10:18:42 AM
Last modification on : Friday, December 17, 2021 - 1:28:02 PM

File

workshop - Version Finale.pdf
Files produced by the author(s)

Identifiers

Citation

Hmida Hmida, Sana Ben Hamida, Amel Borgi, Marta Rukoz. Hierarchical Data Topology Based Selection for Large Scale Learning. 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Jul 2016, Toulouse, France. pp.1221-1226, ⟨10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0186⟩. ⟨hal-02286148⟩

Share

Metrics

Les métriques sont temporairement indisponibles