Shardul Vikram (2011-12). Using Novel Image-based Interactional Proofs and Source Randomization for Prevention of Web Bots. Master's Thesis. Thesis uri icon

abstract

  • This work presents our efforts on preventing the web bots to illegitimately access web resources. As the first technique, we present SEMAGE (SEmantically MAtching imaGEs), a new image-based CAPTCHA that capitalizes on the human ability to define and comprehend image content and to establish semantic relationships between them. As the second technique, we present NOID - a "NOn-Intrusive Web Bot Defense system" that aims at creating a three tiered defence system against web automation programs or web bots. NOID is a server side technique and prevents the web bots from accessing web resources by inherently hiding the HTML elements of interest by randomization and obfuscation in the HTML responses. A SEMAGE challenge asks a user to select semantically related images from a given image set. SEMAGE has a two-factor design where in order to pass a challenge the user is required to figure out the content of each image and then understand and identify semantic relationship between a subset of them. Most of the current state-of-the-art image-based systems like Assira only require the user to solve the first level, i.e., image recognition. Utilizing the semantic correlation between images to create more secure and user-friendly challenges makes SEMAGE novel. SEMAGE does not suffer from limitations of traditional image-based approaches such as lacking customization and adaptability. SEMAGE unlike the current Text based systems is also very user friendly with a high fun factor. We conduct a first of its kind large-scale user study involving 174 users to gauge and compare accuracy and usability of SEMAGE with existing state-of-the-art CAPTCHA systems like reCAPTCHA (text-based) and Asirra (image-based). The user study further reinstates our points and shows that users achieve high accuracy using our system and consider our system to be fun and easy. We also design a novel server-side and non-intrusive web bot defense system, NOID, to prevent web bots from accessing web resources by inherently hiding and randomizing HTML elements. Specifically, to prevent web bots uniquely identifying HTML elements for later automation, NOID randomizes name/id parameter values of essential HTML elements such as "input textbox", "textarea" and "submit button" in each HTTP form page. In addition, to prevent powerful web bots from identifying special user-action HTML elements by analyzing the content of their accompanied "label text" HTML tags, we enhance NOID by adding a component, Label Concealer, which hides label indicators by replacing "label text" HTML tags with randomized images. To further prevent more powerful web bots identifying HTML elements by recognizing their relative positions or surrounding elements in the web pages, we enhance NOID by adding another component, Element Trapper, which obfuscates important HTML elements' surroundings by adding decoy elements without compromising usability. We evaluate NOID against five powerful state-of-the-art web bots including XRumer, SENuke, Magic Submitter, Comment Blaster, and UWCS on several popular open source web platforms including phpBB, Simple Machine Forums (SMF), and Wordpress. According to our evaluation, NOID can prevent all these web bots automatically sending spam on these web platforms with reasonable overhead.
  • This work presents our efforts on preventing the web bots to illegitimately access web resources. As the first technique, we present SEMAGE (SEmantically MAtching imaGEs), a new image-based CAPTCHA that capitalizes on the human ability to define and comprehend image content and to establish semantic relationships between them. As the second technique, we present NOID - a "NOn-Intrusive Web Bot Defense system" that aims at creating a three tiered defence system against web automation programs or web bots. NOID is a server side technique and prevents the web bots from accessing web resources by inherently hiding the HTML elements of interest by randomization and obfuscation in the HTML responses.



    A SEMAGE challenge asks a user to select semantically related images from a given image set. SEMAGE has a two-factor design where in order to pass a challenge the user is required to figure out the content of each image and then understand and identify semantic relationship between a subset of them. Most of the current state-of-the-art image-based systems like Assira only require the user to solve the first level, i.e., image recognition. Utilizing the semantic correlation between images to create more secure and user-friendly challenges makes SEMAGE novel. SEMAGE does not suffer from limitations of traditional image-based approaches such as lacking customization and adaptability. SEMAGE unlike the current Text based systems is also very user friendly with a high fun factor. We conduct a first of its kind large-scale user study involving 174 users to gauge and compare accuracy and usability of SEMAGE with existing state-of-the-art CAPTCHA systems like reCAPTCHA (text-based) and Asirra (image-based). The user study further reinstates our points and shows that users achieve high accuracy using our system and consider our system to be fun and easy.



    We also design a novel server-side and non-intrusive web bot defense system, NOID, to prevent web bots from accessing web resources by inherently hiding and randomizing HTML elements. Specifically, to prevent web bots uniquely identifying HTML elements for later automation, NOID randomizes name/id parameter values of essential HTML elements such as "input textbox", "textarea" and "submit button" in each HTTP form page. In addition, to prevent powerful web bots from identifying special user-action HTML elements by analyzing the content of their accompanied "label text" HTML tags, we enhance NOID by adding a component, Label Concealer, which hides label indicators by replacing "label text" HTML tags with randomized images. To further prevent more powerful web bots identifying HTML elements by recognizing their relative positions or surrounding elements in the web pages, we enhance NOID by adding another component, Element Trapper, which obfuscates important HTML elements' surroundings by adding decoy elements without compromising usability.



    We evaluate NOID against five powerful state-of-the-art web bots including XRumer, SENuke, Magic Submitter, Comment Blaster, and UWCS on several popular open source

    web platforms including phpBB, Simple Machine Forums (SMF), and Wordpress. According to our evaluation, NOID can prevent all these web bots automatically sending spam on these web platforms with reasonable overhead.

publication date

  • December 2011