SAT4SAT --- Statistical Analysis Toolkit for SAT Instances


Download version 1.0

SAT4SAT is an easy-to-use command-line toolkit on linux platform for automatic statistical analysis of SAT instances. It can automatically go over sets of SAT instances in CNF format and collect their statistics. Afterwards, you may perform a guided clustering for further analysis. Finally, you can visualize output data with various tools, such as TOPCAT or gnuplot (Neither are included in this toolkit).

At this moment, it mainly collects statistical features (not probing features) considered in SATzilla07, but more features may be included in the future. A list of provided features can be found in the release. It uses k-means++ algorithm to do clustering. For further information, please have a look at the README file contained in the release.


Introduction

SAT (Boolean Satisfiability Problem) is an important problem in many fields of computer science, such as Artificial Intelligence and Computational Complexity. It is the first problem proved to be NP-Complete, and now still sits in the heart of computational complexity. It also has real-world applications in various fields, such as VLSI design verification. Both theorectical and pratical interest attract researchers to develop better and better SAT solvers.

It seems that each SAT instance has its intrinsic hardness. Intuitively, this intrinsic hardness is correlated to number of variables and clauses for instances of a same type. However, in some categories, other indications of hardness are found. Take 3SAT, SAT instances with 3 literals in each clause, as an example. A phase-transition phenomenon of satisfiability is observed when varying the clause/variable ratio. In the critical region around c/v=4.3, 3SAT instances are particularly hard.

Therefore, it seems reasonable to study the link between statistical features and empirical hardness of SAT instances. This is also the approach of SATzilla, a portfolio of various SAT algorithms. SATzilla uses machine learning techniques to try to guess a correlation between statistical features and empirical hardness for different algorithms of SAT instances. This approach proved its effectiveness in SAT-competition 2007.

Statistical features may be useful in other aspects of SAT. The toolkit SAT4SAT is an initiative to provide easy automatic collection of SAT instances' statistical features.


System Requirement

There are 3 main parts in SAT4SAT: a statistic feature calculation program written in C, an implementation of k-means++ written in C++, and a Perl script that provides automation of analysis. Therefore, SAT4SAT will require gcc, g++ and perl. As most linux distribution includes these 3 components, there should not be any problem on your linux machine. In the same way, it should also work well on Mac OS, though never tested.

For Windows users, you may want to try Cygwin.



By FANG Wenjie
Attribution-Noncommercial-No Derivative Works 3.0 Unported