This is how to search and or filter across the ATI - Early Warning list. If the expression matches the criteria, then these rows are returned. The search filter commands are 'packed-to-the-rafters' with features.

The search filters explained today are in the Intelligence Center. The console has more advanced CVE expressions in ASQ.

We support the simplest search - a keyword search such as “remote code execution” as well as fully-blown filter expressions that have over 80 transformations. The 🔑 thing about search is that they have to return a true or false value for every row, so think about it in these terms, what can you filter for or filter out based on some expression.

Some important considerations

Notification Filters are by default case insensitive; we do this as CVE Database has lots of different ways vendors and products are spelled with mixed cases. This also means you can search for words with whatever typing you want, such as ‘Microsoft, ‘Microsoft,’ etc. We also support raw search; however, we think in most cases, you won’t have to do this. If raw is something you do a lot of in the future, let us know, and we will make your use case some new transformations.

Filters can use either keyword search or filter expressions.

Some examples of where it will work and not work are:

"remote code execution" is a keyword search, is allowed
vendor="Microsoft" and "remote code execution" will not work
vendor="Microsoft" and description.str.contains("remote code execution") will work

Please Note:

An empty filter returns the whole ATI - Early Warning list, an empty filter means don’t filter the list.

An example

An example of writing a search for Cisco.

Such as these filters:

"Internal Network Team" : vendor="cisco"

"External Supported Network Team" : vendor.str.startswith("palo"")

Understanding Filters

There are three parts to understanding filters. The first is the name of the columns and what they mean, and what type they are as certain types such as data fields only allow certain operations and the filter expression language.

ATI - Early Warning Columns

The ATI - Early Warning has a number of columns and definitions.

Column	Description	Type

Column	Description	Type
CVE (cve)	The CVE is usually the Mitre / NVD CVE. However, there are times where CTCI will create a CVE or prefix the CVE with a few letters such as R for Research, I for Intel, H for Honeypot, and p for provisional. An example would be HR-CVE-XXXX-XX, where we have a CVE we see that has matched our Honeypot and has been researched.	string
Year (year_)	The year of the CVE, now we usually extract this from the CVE-YYYY-XXX, where YYYY is the year. However, there are a few CVEs that have been retrospective given different numbers. A year can be filtered and grouped on.	integer
NVD Score (base_score)	This is the NVD base score for this CVE. For unknown values, sometimes this is set to Unknown or sometimes set to a value. When it is, we will add that to the External note.	float
CVE Date (cve_date)	This was when the CVE was published to Mitre, which may or may not had had any details or scoring when it was published.	the string then converted to a date-time object in the back end
NVD Severity (base_severity)	This is the Severity that is found in NVD. When it is an unknown value, we sometimes then set this value. If we do, then this is added to that External note.	string
Vendor (vendor)	This is the vendor; if there is a large number of CPEs for vendors, we still take what Mitre has published. Sometimes for consistency, we will change the capitalization of the Vendor.	string
Product (product)	This is the product; if there is a large number of CPEs for vendors, we still take what Mitre has published as the base product. Sometimes for consistency, we will change the capitalization of the product.	string
Vulnerability Type (vulnerability_type)	This is scarcely added, as you can get some of this information from the CWE.	string
Description (description)	This is mostly the NVD/Mitre description of the CVE.	string / note
Date Added (date_added)	That is when this was added to the list.	string, then converted to date time on the backend
Reference (reference)	This is mostly the NVD reference unless it’s not easily found or there is a lot better reference.	string
CVSSv2	This is a different scoring system, so it makes sense to use CVSSv2	boolean
CVSSv3	This is for CVEs greater than 2016.	boolean
CVSSv4	Future - we will have this option as well	boolean
id_ref	This is a monotonic increasing number used by API and other services to know when we have new CVEs that have to be notified.	integer
Updated On (updated_on)	Today this isn’t a good indicator, as we are generating the updates and checks to the list daily, and thus the field is current. In the future, we want this to be exactly that - updated on.	string, then converted to date time on the backend
CWE (we)	The standard for Common Weakness Enumeration (CWE), where possible we try to retrofit a CWE if one is not properly found.	string
Honeypot (honeypot)	This is the date/time within reason on where we saw the first hit of this CVE exploit.	string, then on the back-end is converted to a date-time object
Intel (Intel)	This is the date/time within reason on where we saw the automated intel hit of this CVE exploit.	string, then on the back-end is converted to a date-time object
Research (research)	This is the date/time within reason on where we an analysts did manual research to find the first hit of this CVE exploit	string, then on the back-end is converted to a date-time object
Processed	This is a flag to say whether you have processed this CVE entry. It’s helpful for integrating what you have already processed and not. Within the UI, you can show all entries, processed entries as grayed out and hide the processed entries.	boolean

Filter expression language

There are two filter expression languages:

Keyword search is like a simple Google string search, and it will search for all fields matching that string. In the UI, just type anything and hit enter. In the API, just make sure you are using the right quoting. The simple view is to use single quotes on the outside and double quotes on the inside of the query, or vice-versa, do not mix as this is like crossing the streams in Ghostbusters. It’s very bad.
A pandas query-like language with a few notable changes/exceptions.

Pandas (Python Data Analysis Library) is an awesome library written in Python. In fact, we don’t think there are many cooler animals than pandas.

If you work in cybersecurity, then learning Python computing language can make you stronger, faster, and able to leap security incidents in a single bound.

We retrieve the stored filter from a database; then we process ATI - Early Warning using the Pandas library query operators with a few notable changes:

The ability to return local variables, functions using the @ operation has been allowed to only a small number of variables.
There is no need to do double equals, though if you still want to, then knock yourself out:

so instead of writing product=="Cisco", we can write product="Cisco".

Case insensitive by default, we do all the hard work making things lower case and checking everything is compared against values in lower case.
We do trimming data so that spaces before words are ignored and such, as data in the Mitre CVE data is messy.
It is still necessary for specific type operations, such as string operations, to put .str after the column, then the operand, for example:

Query Language

Order of operation

This is python plus pandas, so the order of operation is sometimes a little bit undefined. It is best to put parentheses () around operations if you need to ensure they are doing what you expect.

Greater than, Less than, and Equals

In the filter language:

Greater than is >

Less than is <

Equals are =

So to do something greater than and equal to is >=

To do something less than and equal to is <=

Escape Operator - Your get out of jail free card.

By default, we replace = with == so that you don’t have to remember to use == every time you want to do a test. In some cases, you don’t want to have this expansion. We support an ` as the escape, so if you type in `=, then on the backend, we won’t make it ==. Raw is supported as well, but that brings up a whole different can of worms, case sensitivity, etc.

Logical Operators (and, or, and not)

And - is where two conditions must be true. Luke Skywalker and the force.

Or - is where one of the conditions can be true. Luke Skywalker or a Jedi Knight.

Not - converts the expression to the opposite, true becomes false, and false becomes true. Luke Skywalker is not a member of the sith.

Writing filters is about returning a true or false per row, so you will be using logical operators all the time.

Examples:

Logical Or Operator Example

Give me all ATI - Early Warning entries where the vendor is Cisco or Fortinet.

Logical And Operator Example

Give me all Adobe flash vulnerabilities that threat actors are using before 2016 and

Logical Not Operator Example

Give me all Adobe flash vulnerabilities that threat actors are using that are not Critical.

Testing for the empty value we use is a null check

To test for empty strings, values, we need to call a function, isnull(). So you put your field at the front of it, intel.isnull() and butter bing, butter bam, you have a check if empty test. To test for strings that are not null, we can use notnull()

All the columns need this test and reverse test, not <column_name>.isnull() to get all non-empty values.

For example, give me where there is no Intel, and there is Research.

Isin - The cool kids club

Isin operator is for checking if an item is in a list of items.

The isin([<list of items>]) is great for getting all values in or not in a list.

For example:

You can return all the CVEs where the NVD Severity (base_severity) is Low or Medium.

You can return all the CVEs that are not critical. Just put a not in front of the isin condition.

You could return all the CVEs in certain years.

Look what I can do with big numbers.

Pandas mathematical expressions off the hook! There are lots of cool expressions we can write.

Important consideration - We are doing filtering queries that check if each row is true, and if it is, the row is returned. This means all expressions must return a boolean expression, a yes or no, true or false.

So we can do an expression such as base_score.min() on its own, we need to do a comparison operation, this would become base_score<=base_score.min(). This will return the lowest NVD Scored CVE that is being exploited in the wild. [Technically, we could have just done = instead of <=]

We can do things like give use all entries where Oracle is greater than the min NVD Score (base_score) for a vendor.

More likely, you want to do it in quantiles, for example, covering 95% or the bottom 5% or the middle, 50%. Quantiles are in decimal point, so 5% is 05, 50% is 0.5 and 95% is 0.95. To get all Microsoft CVEs that meet the bottom 5% of all NVD Scores.

The rank is a fancy operator in Pandas; this will return the lowest entries that are true outliers based on their NVD Score (base_score). Show that query to your boss; you might get a raise.

Another way in the list for outliers is to find entries that are 3 the standard deviation. This is near 0.1% of the entries.

Get all ATI - Early Warning entries that are at the average (mean). Say the average was 7.76534234 if we did the query base_score==base_score.mean(), we wouldn’t return any rows as the base_scores are one decimal point. So we have to do a plus or minus, of a small distance. See below, where we put in a plus/minus 0.1. Now you can, of course, get all greater than the mean, which you don’t need to do the plus/minus trick, as you are not doing an exact equal to.

Median is handy to find the most common NVD Score vulnerabilities for that vendor.

So we know there are lots of mathematical operations. We explored max, min, median, mean (average), standard deviation (std), quantile, and rank.

String Operations

A lot of the columns use strings, such as vendor, product, description, base_severity, and CWE are strings. You can do lots of operations on strings. To do the operations, you get the column name in lower case or the special column name such as base_severity for NVD severity, then put .str, then put the operation.

Most commonly used string operations.

Operation (<column name>.str.<operation>)	Details

Operation (<column name>.str.<operation>)	Details
isnull()	If empty
not <column name>.str.isnull()	If not empty. If it has something in it!
contains(“<string>”)	If it contains the <string>, for example description.str.contains(“remote code execution”) where return all the ATI - Early Warning entries where the description contains remote code execution. Contains also supports regex - way cool!
startswith(“<string>”)	Suppose the entry for the column starts with <string>, for example, where vendor.str.startswith(“Palo”), to return all the vendors that start with Palo - should return Palo Alto Networks.
endswith(“<string>”)	If the entry for the column ends with <string>, for example, where product.str.endswith(“Player”), to return all the products that end with Player, would return things like Flash Player and such.
len()	Returns the length of the entry, so you could find all products where the length of the product name is less than 5, product.str.len()<=5
slice()	slice(start, stop)
isnum()	Is the string a number
isalphanum()	Is the string alphanumeric, numbers, and letters
cat()	Concatenate a string with a string
get()	Get a character at the index value. Indexes start at zero.
join()	Join a string with a string.
split()	Split a string on a delimiter
rsplit()	Split strings on delimiter working from the end of the string
replace()	Replace occurrences of pattern/regex/string with some other string or the return value of a callable given the occurrence
strip()	Strips spaces or whatever value you pass it
rstrip()	Strips spaces or whatever value you pass it from the right-hand side
lstrip()	Strips spaces or whatever value you pass it from the left-hand side

Other string operations

Operation (<column name>.str.<operation>)	Details

Operation (<column name>.str.<operation>)	Details
repeat()	Duplicate values (<column>.str.repeat(3) equivalent to x * 3)
pad()	Add whitespace to left, right, or both sides of strings
zfill()	Pad strings in the Series/Index by prepending ‘0’ characters.
slice_replace()	Replace slice in each string with the passed value
count()	Count occurrences of pattern
findall()	Compute a list of all occurrences of pattern/regex for each string
match()	Call re.match on each element, returning matched groups as a list
extract()	Call re.search on each element, returning DataFrame with one row for each element and one column for each regex capture group
extractall()	Call re.findall on each element, returning DataFrame with one row for each match and one column for each regex capture group
lower()	Make the string lower
casefold()	Make the string have a case fold
upper()	Make the string upper case
find()	Return the index of a string find
capitalize()	Return the string capitalized
swapcase()	Return the string with the case swapped
isalnum()	Returns whether all characters are alphanumeric
isalpha()	Returns whether all characters are alphabetic
isdigit()	Returns whether all characters are digits
isspace()	Returns whether all characters are spaces
islower()	Returns whether the string is lower
isupper()	Returns whether the string is upper
istitle()	Returns whether the string is in title case
isnumeric()	Returns whether the string is numeric
isdecimal()	Returns whether the string is decimal

There are even more at https://pandas.pydata.org/docs/index.html

Dates and Times

Dates and times are always tricky; sometimes, the simple things are the hardest things to do. On top of that, we have timezone operations, and such can further complicate the matter.

Comparing Dates and Times

You can compare one date time column against another.

For example, say you wanted to find out how many ATI - Early Warning entries had Intel, Research or Honeypot hits on or before the CVE date?

How about when did we have Research before Intel

Dates and times can also be treated as strings and then converted to dates on the comparison. For example, how many ATI - Early Warning entries have CVE published date between 2016 and 2018?

You can also do mathematical operations such as min(), max(). For example, what ATI - Early Warning entry is the earliest CVE published CVE?

CTCI Documentation

ATI - Early Warning: Filtering and Searching