Sa2px: A tool to translate SpamAssassin regular expression rules to POSIX Conference Paper uri icon

abstract

  • This paper presents a software tool sa2px to translate regular expressions (regexps) in SpamAssassin (SA) rules into the POSIX format. The translated regexps can be implemented on different platforms, so that one could better separate the composition process of spam filtering rules from the on-line operations. Sa2px is consisted of three layers of functions. The first layer is responsible for translating plugins and special formats to their equivalent basic SA formats. The second layer uses a syntax conversion approach to translate basic SA rules to the POSIX format. The third layer uses a backward grouping algorithm to group multiple regexps together so that they can be packed into a DFA table using Flex or similar tools. Overall, sa2px can translate regexps in the whole rule set (uri, body, header, rawbody and ReplaceTags plugin), and the translation rate of 1115 SA regexp rules is 84.5%. In comparison, sa-compile can translate 296 rules of 453 body rules. The translated rules are then clustered into several main groups, except for some cases in which the regexp structures led to explosive state growth. Finally, DFA tables and (action number, rule name) pairs are generated. Experimental results show that the DFA table based implementation of these translated regexps cut down 66% of the execution time of the Perl (with sa-compile activated) based string scanning under process-level parallelization environment.

published proceedings

  • 6th Conference on Email and Anti-Spam, CEAS 2009

author list (cited authors)

  • Pu, S., Tan, C. C., & Liu, J. C.

complete list of authors

  • Pu, S||Tan, CC||Liu, JC

publication date

  • January 2009