Spam emails are something we all despise. Apart from getting bombarded with irrelevant messages, they might contain malware and phishing content that can cause damage to you. Spam filters play a vital role in protecting you from them. Spam detector models can automatically detect spam, but we need to train them on large email datasets and label them manually. This is what researchers at Sinhgad Institute of Technology Lonavala in India hope to address.

“Spam detection is essential since it can ensure justice for the sellers and retain the trust of the buyer on the online stores,” said Vikas Samarthrao Kadam, one of the researchers involved in the study paper published in the International Journal of Intelligent Robotics and Applications. “In contrast with other methods, it improves the training speed and efficiency of classification. Our model could improve with the quality of life for people who receive large amounts of emails, allowing them to browse through their email smoothly and only use their accounts for their desired purpose.”

The team developed a model based on a multi-objective feature selection and adaptive capsule network, and trained it on both image and text datasets. This model using the deep learning technique is said to offer easy implementation and can be trained quickly within short periods of time. Kadam said that their initial evaluations show that the new model possesses greater accuracy than other existing methods. The team noted that this can help to improve the security of users and help them skim irrelevant emails better and easier.

“Our model also reduces training speeds and leads to greater efficiency of classification,” said Kadam in an interview with TechXplore. “In contrast with other models, it increases the convergence rate of the spam email detection, achieving better results.”

On the other hand, the group stated that the model still needs to be developed to ensure utmost efficiency in terms of speed and precision. Once it’s ready, nonetheless, the spam filtering technique can be used on a large-scale basis, including on Gmail, Yahoo Mail, and Outlook.

“The security of spam detection and filtration systems is crucial to achieve better accuracy and reliable results, which can be improved in the future using ensemble learning,” Kadam said. “The false positive rate of many models is still higher than required, but it should be reduced to the smallest possible value in future. Real-time spam classification is much needed, as most of the proposed models do not work well with real-time data… Almost all researchers present their results based on the accuracy, precision and recall, of their models, but we feel that the time complexity of machine learning models should also be considered as an evaluation metric,” Kadam said. “Some researchers show promising results in the process of feature extraction using a bag of words, as they claim that the email header is as important for spam detection as the content of the body. So, deep feature extraction of the header line could also be considered in the future.”