Tool for building structured data out of random input.
When you do fuzzing testing of a project that is complex enough to have syntax parser, and core functionality behind it, your fuzzer will probably spend a lot of cpu working with syntax parser. Fuzzing syntax parser is a good thing, but it does not really help to fuzz core functionality. One of the general approaches here may be is to generate such input data that is syntactically correct from the perspective of syntax parser, and meanwhile totally random form the perspective of core functionality.
LibBlobStamper is a tool for building convertors form random binary data into random data with required syntax.
LibBlomStamper is written in C++, can be used for generating both text data with desired syntax, and structured binary data (e.g. C-structures). You can build this convertor right into you binary test-unit, so this conversion will be opaque both for fuzzer, or DSE-tools.
Blob is a chunk of binary data that presumably came fuzzer, it is considered to be random. Blob data is used by Stamps for constructing structured values (syntactically correct strings or C-structures). When Stamp uses chunk of Blob's data, this data is removed from the Blob. You can use stamp on a Blob as many times as you like until you are out of Blob data.
Stamp is a C++ object that "bites" chunk of binary data from Blob and converts it into certain structured representation (text string with syntax that is provided by stamp or C-structure)
char data[] ="abcdefghijk";
Blob blob(data, strlen(data)); // Blob with "random" data
StampArithm<short int> stamp; // Stamp for getting short integer (both string and value representations)
std::string s;
short int i;
s = stamp.ExtractStr(blob); // bite short int data from blob and save it to a string. Will get "25185"
i = stamp.ExtractValue(blob); // bite short int data from blob and save it to short int vriable. Will get 25699
As you can see Stamps can extract values of various type. Each extracted type is provided with proper extract method:
-
ExtractStr
returnsstd::string
value. That string will be formatted according to the syntax that is implemented in this extract method -
ExtractValue
returns "value" of C-structure or of another C-variable. In the example above it is value ofshort int
-variable. -
ExtractPValue
same asExtractValue
, but returns pointer to the value. Or, more precisely,sized_ptr<T>
pointer (see below (FIXME not written yet)) -
ExtractBin
same asExtractPValue
but returns extracted structure as array of characters (std::vector<char>
). You will can work with it as binary buffer or cast it to desired type manually.
Stamp must have at least one of the extract method implemented.
In your own stamp you will probably implement only extract methods you need, either string or binary.
StampArithm<T>
has all of them, but this seems to be an exceptional case.
Amount of data that can be consumed by Stamp is called Stamp Size. Depending on Min Stamp Size and Max Stamp Size, Stamps cam be divided into three groups:
-
Fixed Size Stamps: Stamp consumes fixed amount of data (Min Stamp Size == Max Stamp Size). For example
StampArithm<T>
stamp always consumessizeof(T)
bytes. -
Variated Size Stamp: Min Stamp Size != Max Stamp Size. For example stamp that generates string with random Latin letters 3 to 16 character long. It consumes 3..16 bytes and "normalizes" them to Latin character bytes.
-
Unbounded Size Stamp: Stamp that has Min Size, but will consume any amount of data if provided.
Min and Max Stamp Sizes are available via minSize()
and maxSize()
methods.
For Unbound Size Stamps maxSize()
is set to -1
.
Also please note, that stamps are greedy, they will try to consume all data they can.
E.g. Unbounded Size Stamp will consume all data from the Blob.
Variated Size Stamp will try to eat MaxSize()
bytes, but will be satisfied with anything grater or equal to minSize()
.
To limit Stamps appetite you should use Galleys.
Galley is a way to squeeze several Stamps into one object. You can think about LibBlobStamper's Galley as about letterpress galley: you have several stamps, you put them into a galley, and now you have one bigger stamp. You would definitely need Galley if you want to split Blob data between several Unbounded Stamps. Each Stamp tries to use all data, and Galley is the way to divide available data between Stamps. For Variated Stamps story is the same: they must not always get all data they want.
There are two types of Galleys in LibBlobStamper now: GalleyVector and GalleySet.
Galley Vector is used to slice all Blob data into parts using one selected stamp.
For Fixed Size Stamp, blob will be chopped to parts that fits the Stamp, and all these parts will be fed to the Stamp.
For Variated and Unbounded Stamps Galley will use tricky algorithm to decide how to split the Blob data (the algorithm will be discussed later) and then will apply target stamp to each data chunk.
Galley will return std::vector<std::string>
or std::vercor<T>
, depending on what extract type you are going to use.
Example:
char data[] ="abcdefghijk";
Blob blob1(data, strlen(data)); // Blob with "random" data
Blob blob2(data, strlen(data)); // Another Blob with same data
StampArithm<short int> stamp; // Stamp for short integer data (both string and value)
GalleyVectorStr galley_s(stamp);
GalleyVectorV<short int> galley_v(stamp);
std::vector<std::string> res_s = galley_s.ExtractStrVector(blob1);
std::vector<short int> res_v = galley_v.ExtractValuesVector(blob2);
Galley Set allows simultaneously apply stamps of different types. Same as Galley Vector it uses tricky algorithm to divide Blob Data between stamps, but in this case these are different Stamps.
For now Galley Set works with String and Binary extracted types. It is not quite clear how to implement Galley with Values extracted type using C++ facilities.
Example:
char data[] ="abcdefghijk";
Blob blob(data, strlen(data)); // Blob with "random" data
StampArithm<short int> stamp_i; // Stamp for short integer data (both string and value)
StampArithm<float> stamp_f; // Stamp for float numeric data (both string and value)
GalleySetStr galley({stamp_i, stamp_f});
std::vector<std::string> res = galley.ExtractStrSet(blob);
Galleys and Stamps inherit same base class, so you can make Stamp from a Galley by implementing appropriate Extract method. This will be explained below in "Creating Stamps" section.
LibBlobStamper have been designed keeping in mind that it should be able to create strings with nested syntax (e.g. arithmetic expressions).
This work is still in progress, it is quite raw to be documented properly, but you can explore examples/exampleZZ.cpp
to see current status of Stamp recursion.
General idea: you should inherit from base class that provides Extract method we need (e.g. inherit from StampBaseStr
to get ExtractStr()
).
Implement minSize()
and maxSize()
methods, and Extract method you've chosen.
Normally you will seldom need to work with raw blob data. Most probably you will combine existing basic stamps to create complex one.
Let's imagine you need to generate stamp for complex numbers a + ib
(e.g. 12+ 3i
).
Let's imagine that a
and b
are not really big integers.
To build this stamp we will use two arithmetic stamps that will give us text representation of short int
, and we will combine them the way we want.
Class definition will look like this. We define stamps we will use while building string right inside the class.
class ComplexIntStamp: public StampBaseStr
{
protected:
StampArithm<short int> stampA, stampB;
public:
virtual int minSize() override;
virtual int maxSize() override;
virtual std::string ExtractStr(Blob &blob) override;
};
Actually here we can have one StampArithm<short int>
stamp, and apply it two times.
But for making an example more clear, we will explicitly declare both stamps.
As we are going to apply each stamp only once, we can calculate min and max sizer of our new stamp as sum of min and max sizes of stamps we have used:
int ComplexIntStamp::minSize()
{
return stampA.minSize() + stampB.minSize();
}
int ComplexIntStamp::maxSize()
{
return stampA.maxSize() + stampB.maxSize();
}