Semi-supervised deep learning approach to break common CAPTCHAs

Manual data annotation is a time consuming activity. A novel strategy for automatic training of the CAPTCHA breaking system with no manual dataset creation is presented in this paper. We demonstrate the feasibility of the attack against a text-based CAPTCHA scheme utilizing similar network infrastructure used for Denial of Service attacks. The main goal of our research is to present a possible vulnerability in CAPTCHA systems when combining the brute-force attack with transfer learning. The classification step utilizes a simple convolutional neural network with 15 layers. Training stage uses automatically prepared dataset created without any human intervention and transfer learning for fine-tuning the deep neural network classifier. The designed system for breaking text-based CAPTCHAs achieved 80% classification accuracy after 6 fine-tuning steps for a 5 digit text-based CAPTCHA system. The results presented in this paper suggest, that even the simple attack with a large number of attacking computers can be an effective alternative to current CAPTCHA breaking systems.

Keywords

CAPTCHA, Semi-supervised learning, Convolutional Neural Networks

Citation

NEURAL COMPUTING & APPLICATIONS. 2021, vol. 33, issue 20, p. 13333-13343.
https://link.springer.com/article/10.1007%2Fs00521-021-05957-0

Document type

Peer-reviewed

Document version

Accepted version

Date of access to the full text

2022-04-13

Language of document

en

Document licence

(C) Springer