ATI - Early Warning: Filtering and Searching

This is how to search and or filter across the ATI - Early Warning list. If the expression matches the criteria, then these rows are returned. The search filter commands are 'packed-to-the-rafters' with features.

The search filters explained today are in the Intelligence Center. The console has more advanced CVE expressions in ASQ.

We support the simplest search - a keyword search such as “remote code execution” as well as fully-blown filter expressions that have over 80 transformations. The 🔑 thing about search is that they have to return a true or false value for every row, so think about it in these terms, what can you filter for or filter out based on some expression.

Some important considerations

Notification Filters are by default case insensitive; we do this as CVE Database has lots of different ways vendors and products are spelled with mixed cases. This also means you can search for words with whatever typing you want, such as ‘Microsoft, ‘Microsoft,’ etc. We also support raw search; however, we think in most cases, you won’t have to do this. If raw is something you do a lot of in the future, let us know, and we will make your use case some new transformations.

Filters can use either keyword search or filter expressions.

Some examples of where it will work and not work are:

  • "remote code execution" is a keyword search, is allowed

  • vendor="Microsoft" and "remote code execution" will not work

  • vendor="Microsoft" and description.str.contains("remote code execution") will work

Please Note:

An empty filter returns the whole ATI - Early Warning list, an empty filter means don’t filter the list.

An example

An example of writing a search for Cisco.

Such as these filters:

"Internal Network Team" : vendor="cisco"
"External Supported Network Team" : vendor.str.startswith("palo"")

Understanding Filters

There are three parts to understanding filters. The first is the name of the columns and what they mean, and what type they are as certain types such as data fields only allow certain operations and the filter expression language.

ATI - Early Warning Columns

The ATI - Early Warning has a number of columns and definitions.

Column

Description

Type

Column

Description

Type

CVE (cve)

The CVE is usually the Mitre / NVD CVE. However, there are times where CTCI will create a CVE or prefix the CVE with a few letters such as R for Research, I for Intel, H for Honeypot, and p for provisional. An example would be HR-CVE-XXXX-XX, where we have a CVE we see that has matched our Honeypot and has been researched.

string

Year (year_)

The year of the CVE, now we usually extract this from the CVE-YYYY-XXX, where YYYY is the year. However, there are a few CVEs that have been retrospective given different numbers. A year can be filtered and grouped on.

integer

NVD Score (base_score)

This is the NVD base score for this CVE. For unknown values, sometimes this is set to Unknown or sometimes set to a value. When it is, we will add that to the External note.

float

CVE Date (cve_date)

This was when the CVE was published to Mitre, which may or may not had had any details or scoring when it was published.

the string then converted to a date-time object in the back end

NVD Severity (base_severity)

This is the Severity that is found in NVD. When it is an unknown value, we sometimes then set this value. If we do, then this is added to that External note.

string

Vendor (vendor)

This is the vendor; if there is a large number of CPEs for vendors, we still take what Mitre has published. Sometimes for consistency, we will change the capitalization of the Vendor.

string

Product (product)

This is the product; if there is a large number of CPEs for vendors, we still take what Mitre has published as the base product. Sometimes for consistency, we will change the capitalization of the product.

string

Vulnerability Type (vulnerability_type)

This is scarcely added, as you can get some of this information from the CWE.

string

Description (description)

This is mostly the NVD/Mitre description of the CVE.

string / note

Date Added (date_added)

That is when this was added to the list.

string, then converted to date time on the backend

Reference (reference)

This is mostly the NVD reference unless it’s not easily found or there is a lot better reference.

string

CVSSv2

This is a different scoring system, so it makes sense to use CVSSv2

boolean

CVSSv3

This is for CVEs greater than 2016.

boolean

CVSSv4

Future - we will have this option as well

boolean

id_ref

This is a monotonic increasing number used by API and other services to know when we have new CVEs that have to be notified.

integer

Updated On (updated_on)

Today this isn’t a good indicator, as we are generating the updates and checks to the list daily, and thus the field is current. In the future, we want this to be exactly that - updated on.

string, then converted to date time on the backend

CWE (we)

The standard for Common Weakness Enumeration (CWE), where possible we try to retrofit a CWE if one is not properly found.

string

Honeypot (honeypot)

This is the date/time within reason on where we saw the first hit of this CVE exploit.

string, then on the back-end is converted to a date-time object

Intel (Intel)

This is the date/time within reason on where we saw the automated intel hit of this CVE exploit.

string, then on the back-end is converted to a date-time object

Research (research)

This is the date/time within reason on where we an analysts did manual research to find the first hit of this CVE exploit

string, then on the back-end is converted to a date-time object

Processed

This is a flag to say whether you have processed this CVE entry. It’s helpful for integrating what you have already processed and not. Within the UI, you can show all entries, processed entries as grayed out and hide the processed entries.

boolean

Filter expression language

There are two filter expression languages:

  • Keyword search is like a simple Google string search, and it will search for all fields matching that string. In the UI, just type anything and hit enter. In the API, just make sure you are using the right quoting. The simple view is to use single quotes on the outside and double quotes on the inside of the query, or vice-versa, do not mix as this is like crossing the streams in Ghostbusters. It’s very bad.

  • A pandas query-like language with a few notable changes/exceptions.

Pandas (Python Data Analysis Library) is an awesome library written in Python. In fact, we don’t think there are many cooler animals than pandas.

If you work in cybersecurity, then learning Python computing language can make you stronger, faster, and able to leap security incidents in a single bound.

We retrieve the stored filter from a database; then we process ATI - Early Warning using the Pandas library query operators with a few notable changes:

  • The ability to return local variables, functions using the @ operation has been allowed to only a small number of variables.

  • There is no need to do double equals, though if you still want to, then knock yourself out:

so instead of writing product=="Cisco", we can write product="Cisco".
  • Case insensitive by default, we do all the hard work making things lower case and checking everything is compared against values in lower case.

  • We do trimming data so that spaces before words are ignored and such, as data in the Mitre CVE data is messy.

  • It is still necessary for specific type operations, such as string operations, to put .str after the column, then the operand, for example:

More Documentation to help you

As mentioned previously, we use Pandas for doing the filtering. Well, there is a wealth of knowledge with this library - happy days! In ATI - Early Warning, there are columns returned, and in Pandas, they are called Series. Series = Column, so anytime you see Series think of column name. If a command has Series.str.contains(), think this as replacing Series with your column. It becomes description.str.contains(“memory corruption”) - this would return all rows that have memory corruption within their description.

Go to: https://pandas.pydata.org/docs/index.html.

Pandas Website

 

Then search for contains.

Search results for contains

As we mentioned before, Series are columns in our world; we are looking for pandas.Series.<what we are after>.

 

Note: As previously mentioned, we do everything by default in lower case, so when using contains, use the lowercase string of what you are searching for. Notice the documentation also brings up, match, ‘startswith’ and ‘endswith.’

Query Language

Order of operation

This is python plus pandas, so the order of operation is sometimes a little bit undefined. It is best to put parentheses () around operations if you need to ensure they are doing what you expect.

Greater than, Less than, and Equals

In the filter language:

Greater than is >

Less than is <

Equals are =

So to do something greater than and equal to is >=

To do something less than and equal to is <=

Escape Operator - Your get out of jail free card.

By default, we replace = with == so that you don’t have to remember to use == every time you want to do a test. In some cases, you don’t want to have this expansion. We support an ` as the escape, so if you type in `=, then on the backend, we won’t make it ==. Raw is supported as well, but that brings up a whole different can of worms, case sensitivity, etc.

Logical Operators (and, or, and not)

And - is where two conditions must be true. Luke Skywalker and the force.

Or - is where one of the conditions can be true. Luke Skywalker or a Jedi Knight.

Not - converts the expression to the opposite, true becomes false, and false becomes true. Luke Skywalker is not a member of the sith.

Writing filters is about returning a true or false per row, so you will be using logical operators all the time.

Examples:

Logical Or Operator Example

Give me all ATI - Early Warning entries where the vendor is Cisco or Fortinet.

Logical And Operator Example

Give me all Adobe flash vulnerabilities that threat actors are using before 2016 and

Logical Not Operator Example

Give me all Adobe flash vulnerabilities that threat actors are using that are not Critical.

Testing for the empty value we use is a null check

To test for empty strings, values, we need to call a function, isnull(). So you put your field at the front of it, intel.isnull() and butter bing, butter bam, you have a check if empty test. To test for strings that are not null, we can use notnull()

All the columns need this test and reverse test, not <column_name>.isnull() to get all non-empty values.

For example, give me where there is no Intel, and there is Research.

Isin - The cool kids club

Isin operator is for checking if an item is in a list of items.

The isin([<list of items>]) is great for getting all values in or not in a list.

For example:

You can return all the CVEs where the NVD Severity (base_severity) is Low or Medium.

You can return all the CVEs that are not critical. Just put a not in front of the isin condition.

You could return all the CVEs in certain years.

Look what I can do with big numbers.

Pandas mathematical expressions off the hook! There are lots of cool expressions we can write.

Important consideration - We are doing filtering queries that check if each row is true, and if it is, the row is returned. This means all expressions must return a boolean expression, a yes or no, true or false.

So we can do an expression such as base_score.min() on its own, we need to do a comparison operation, this would become base_score<=base_score.min(). This will return the lowest NVD Scored CVE that is being exploited in the wild. [Technically, we could have just done = instead of <=]

We can do things like give use all entries where Oracle is greater than the min NVD Score (base_score) for a vendor.

More likely, you want to do it in quantiles, for example, covering 95% or the bottom 5% or the middle, 50%. Quantiles are in decimal point, so 5% is 05, 50% is 0.5 and 95% is 0.95. To get all Microsoft CVEs that meet the bottom 5% of all NVD Scores.

The rank is a fancy operator in Pandas; this will return the lowest entries that are true outliers based on their NVD Score (base_score). Show that query to your boss; you might get a raise.

Another way in the list for outliers is to find entries that are 3 the standard deviation. This is near 0.1% of the entries.

Get all ATI - Early Warning entries that are at the average (mean). Say the average was 7.76534234 if we did the query base_score==base_score.mean(), we wouldn’t return any rows as the base_scores are one decimal point. So we have to do a plus or minus, of a small distance. See below, where we put in a plus/minus 0.1. Now you can, of course, get all greater than the mean, which you don’t need to do the plus/minus trick, as you are not doing an exact equal to.

Median is handy to find the most common NVD Score vulnerabilities for that vendor.

So we know there are lots of mathematical operations. We explored max, min, median, mean (average), standard deviation (std), quantile, and rank.

String Operations

A lot of the columns use strings, such as vendor, product, description, base_severity, and CWE are strings. You can do lots of operations on strings. To do the operations, you get the column name in lower case or the special column name such as base_severity for NVD severity, then put .str, then put the operation.

Most commonly used string operations.

Operation (<column name>.str.<operation>)

Details

Operation (<column name>.str.<operation>)

Details

isnull()

If empty

not <column name>.str.isnull()

If not empty. If it has something in it!

contains(“<string>”)

If it contains the <string>, for example description.str.contains(“remote code execution”) where return all the ATI - Early Warning entries where the description contains remote code execution. Contains also supports regex - way cool!

startswith(“<string>”)

Suppose the entry for the column starts with <string>, for example, where vendor.str.startswith(“Palo”), to return all the vendors that start with Palo - should return Palo Alto Networks.

endswith(“<string>”)

If the entry for the column ends with <string>, for example, where product.str.endswith(“Player”), to return all the products that end with Player, would return things like Flash Player and such.

len()

Returns the length of the entry, so you could find all products where the length of the product name is less than 5, product.str.len()<=5

slice()

slice(start, stop)

isnum()

Is the string a number

isalphanum()

Is the string alphanumeric, numbers, and letters

cat()

Concatenate a string with a string

get()

Get a character at the index value. Indexes start at zero.

join()

Join a string with a string.

split()

Split a string on a delimiter

rsplit()

Split strings on delimiter working from the end of the string

replace()

Replace occurrences of pattern/regex/string with some other string or the return value of a callable given the occurrence

strip()

Strips spaces or whatever value you pass it

rstrip()

Strips spaces or whatever value you pass it from the right-hand side

lstrip()

Strips spaces or whatever value you pass it from the left-hand side

Other string operations

Operation (<column name>.str.<operation>)

Details

Operation (<column name>.str.<operation>)

Details

repeat()

Duplicate values (<column>.str.repeat(3) equivalent to x * 3)

pad()

Add whitespace to left, right, or both sides of strings

zfill()

Pad strings in the Series/Index by prepending ‘0’ characters.

slice_replace()

Replace slice in each string with the passed value

count()

Count occurrences of pattern

findall()

Compute a list of all occurrences of pattern/regex for each string

match()

Call re.match on each element, returning matched groups as a list

extract()

Call re.search on each element, returning DataFrame with one row for each element and one column for each regex capture group

extractall()

Call re.findall on each element, returning DataFrame with one row for each match and one column for each regex capture group

lower()

Make the string lower

casefold()

Make the string have a case fold

upper()

Make the string upper case

find()

Return the index of a string find

capitalize()

Return the string capitalized

swapcase()

Return the string with the case swapped

isalnum()

Returns whether all characters are alphanumeric

isalpha()

Returns whether all characters are alphabetic

isdigit()

Returns whether all characters are digits

isspace()

Returns whether all characters are spaces

islower()

Returns whether the string is lower

isupper()

Returns whether the string is upper

istitle()

Returns whether the string is in title case

isnumeric()

Returns whether the string is numeric

isdecimal()

Returns whether the string is decimal

There are even more at https://pandas.pydata.org/docs/index.html

Dates and Times

Dates and times are always tricky; sometimes, the simple things are the hardest things to do. On top of that, we have timezone operations, and such can further complicate the matter.

Comparing Dates and Times

You can compare one date time column against another.

For example, say you wanted to find out how many ATI - Early Warning entries had Intel, Research or Honeypot hits on or before the CVE date?

How about when did we have Research before Intel

Dates and times can also be treated as strings and then converted to dates on the comparison. For example, how many ATI - Early Warning entries have CVE published date between 2016 and 2018?

You can also do mathematical operations such as min(), max(). For example, what ATI - Early Warning entry is the earliest CVE published CVE?