Analyzing Malicious Windows Programs
Last updated
Last updated
What is the Windows API?
A broad set of functionality that governs the way that malware interacts with the Microsoft libraries
Uses its own names to represent C types
Hungarian Notation
used for API function identifiers
Uses a prefix naming scheme that makes it easy to identify a variable's type
What are handles?
Items that have been opened or created in the OS:
Cannot be used in arithmetic operations
Do not always represent the object's address
Only thing you can do with handles is store it and use it in a later function call to refer to the same object
function - returns an HWND
, which is a handle to a window
Common ways that malware interacts with the system:
creating or modifying files
Distinct filenames
Changes to existing filenames
Functions for accessing the file system
Used to create and open files
Can open existing files, pipes, streams, and I/O devices
Can also create new files
parameter controls whether the function creates a new file or opens an existing one
and WriteFile
Used for reading and writing to files
Operate on files as a stream
and MapViewOfFile
File mappings are commonly used by malware writers because they allow a file to be loaded into memory and manipulated easily
- loads a file from disk into memory
- returns a pointer to the abuse address of the mapping, can be used to access the file in memory
Malware calling these functions could use the pointer returned from MapViewOfFile
to read and write anywhere in the file
Handy when parsing a file format
Malware can obtain map of file, make changes in memory and execute the PE file as if it had been loaded by the OS loader
Not accessed by their drive letter and folder
Stealthier than regular ones because they don't show up in directory listings
Provide greater access to system hardware and internal data
Can be passed as strings to any of the file-manipulation functions and operate on a file as if it were a normal file
Special files with names that start with \serverName\share
Access directories or files in a shared folder stored on a network
The \\?\
prefix tells the OS to disable all string parsing and allows access to longer filenames
Thought of as a fixed number of folders, each storing different types of objects.
NT Namespace
Lowest level namespace is the NT namespace with the \
The NT namespace
has access to all devices and all other namespaces exist within the NT namespace
Win32 device namespace
Prefix \\.\
Often used by malware to access physical devices directly, and read and write to them like a file
Example: \\.\PhysicalDisk1
to directly access Disk1 (ignoring the file system) allowing it to modify it in ways not possible using the API
Malware might be able to read and write data to an unallocated sector without creating or accessing files, allows it to avoid detection by AV and security programs
Witty worm
accessed \Device\PhysicalDisk1
via the NT namespace to corrupt its victim's file system
Would open it and write to a random space on the drive at regular intervals, eventually corrupting the victim's OS and rendering it unable to boot
Malware can also access physical memory directly, allows user-space programs to write to kernel space.
This technique is used by malware to modify the kernel and hide programs in user space
Allows additional data to be added to an existing file within NTFS, essentially adding one file to another
Extra data does not show up in a directory listing and it is not shown when displaying the contents of the file; only visible when you access the stream
Named according to the convention normalFile.txt:Stream:$DATA
Allows a program to read and write to a stream
Malware authors like ADS because it can be used to hide data
Malware often uses the Registry for persistence or configuration data
Malware adds entries into the registry that will allow it to run automatically when the computer boots
Writing entries to the Run
subkey set up software to run automatically - often used by malware to launch itself automatically
Malware uses registry functions that are part of the Windows API to modify the registry to run automatically when the system boots
Common Functions:
- opens a registry for editing and querying
- adds a new value to the registry and sets its data
- returns the data for a value entry in the registry
If you see these in malware, you need to identify the registry keys they are accessing
They are like scripts for changing the registry
Files with a .reg extension contain human-readable registry data.
When a user double-clicks a .reg file, it automatically modifies the registry by merging the information the file contains into the registry
Malware uses .reg files to modify the registry
Malware relies on network functions to do its dirty work
Malware most commonly uses Berkeley compatible sockets (primarily implemented in ws2_32.dll)
function has to be called before any other networking functions to allocate resources for the networking libraries.
While debugging code, set a breakpoint on WSAStartup
Server side - maintains an open socket waiting for incoming connections
Client side - connects to a waiting socket
socket call
connect call
send/recv calls
A higher-level API
Functions are stored in Wininet.dll
Implements protocols like HTTP and FTP at the application layer
You can gain an understanding of what malware is doing based on connections it opens
- used to initialize a connection to the Internet
- used to connect to a URL
- allows the program to read the data from a file downloaded from the Internet
Malware can use this to connect to a remote server and get further instructions for execution
First and most common way to access code outside a single file is through the use of DLLs
Dynamic Link Libraries (DLLs)
Windows' way to use libraries to share code among multiple applications
An executable file that does not run alone, but exports functions that can be used by other applications.
Main advantages
Memory used by the DLLs can be shared among running processes
When distributing an executable, you can use DLLs that are known to be on the host Windows system without needing to redistribute them
DLLs are useful code-reuse mechanism
Maintain a single library of common code and distribute it only when needed.
To store malicious code
Store malicious code in a DLL rather than in an .exe
Malware sometimes uses DLLs to load itself into another process
By using Windows DLLs
Functionality needed to interact with the OS
By using third-party DLLs
Malware can use third-party DLLs to interact with other programs
Example - use the Mozilla Firefox DLL to connect back to a server, rather than connecting directly through the Windows API
DLLs use the PE file format
Only a single flag indicates that the file is a DLL
Often have more exports and fewer imports
Other than these there is no real difference between a DLL and an .exe
Main DLL function
It has no label
Is not an export in the DLL, but it is specified in the PE header as the file's entry point
Function is called to notify the DLL whenever a process
Loads or unloads the library
Creates a new thread
Finishes an existing thread
This notification allows the DLL to manage any per-process or per-thread resources
Malware can execute code outside the current program by creating a new process or modifying an existing one
Windows uses processes as containers to manage resources and keep separate programs from interfering with each other
Each process is given a memory space that is separate from all other processes and that is a sum of memory addresses that the process can use
When the process requires memory, the OS allocates memory and give the process an address that it can sue to access the memory
Processes can share memory addresses
Addresses are the same, but the physical memory that stores the data is not the same
A malicious program that accesses a memory address, will affect only what is stored at that address for the process that contains the malicious code
- most commonly used function by malware to create a new process
Malware could call this function to create a process to execute it malicious code to bypass host-based firewalls and other security mechanisms
Commonly used by malware to create a simple remote shell with just a single function call
includes a handle to the standard input, standard output and standard error streams for a process
malicious programs could set these values to a socket, so that when the program writes to standard output, it is really writing to the socket, allowing an attacker to execute a shell remotely without running anything other than the call to CreateProcess
Call to CreateProcess
creates a new process so that all input and output are redirected to a socket
Malware often creates a new process by storing one program inside another in the resource section
When the program runs
Extracts the additional executable from the PE header, writes it to disk and then call CreateProcess
to run the program
Processes contain threads
Threads are what the Windows OS executes
Threads are independent sequences of instructions that are execute by the CPU without waiting for other threads
Threads within a process all share the same memory space, but each has its own processor registers and stack
Running threads have complete control of the CPU
When an OS switches between threads, all values in the CPU are saved in a structure (thread context)
Used to create new threads
Caller specifies a start address, often called the start
Execution begins at the start address and continues until the function returns
Caller of CreateThread
can specify the function where the thread starts and a single parameters to be passed to the start
Malware can use CreateThread
in multiple ways
Used to load a new malicious library into a process
The address of LoadLibrary
specified as the start address
Argument passed to CreateThread
is the name of the library to be loaded
The new DLL is loaded into memory in the process and DllMain
is called
Create two new threads for input and output
One to listen on a socket or pipe and then output that to standard input of a process
The other to read from standard output and send that to a socket or pipe
Goal is to send all information to a single socket or pipe in order to communicate seamlessly with the running application
Fibers are like threads, but are managed by a thread, rather than by the OS
Also called mutants when in the kernel
Are global objects that coordinate multiple processes and threads
Mainly used to control access to shared resources
If two threads must access a memory structure, but only one can safely access it at a time, a mutex can be used to control access
Only one thread can own a mutex at a time
Important to malware analysis because they often use hard-coded names, making them good host-based indicators
Hard-coded names are common because mutex's name must be consistent it used by two processes
Threads gains access to the mutex with a call to WaitForSingleObject
When a thread is done using a mutex it uses ReleaseMutex
Creates a mutex
Malware will commonly create a mutex and try to open an existing mutex with the same name to make sure that only one version of the malware is funning at a time
Another way for malware to execute additional code
Services run as background applications
Scheduled and run by the Windows service manager without user input
Advantages for malware writers
Services are normally run as SYSTEM
or another privileged account
account has more access than administrator or user accounts
Provide another way to maintain persistence on a system
Users wouldn't find anything suspicious, because malware is not running in a separate process
Key Windows API functions related to services:
Returns a handle to the service control manager
Used for all subsequent service-related function calls
Any code that interacts with services will call this function
Adds a new service to the service control manager
The caller can specify whether the service will start automatically at boot time or has to be started manually
Starts a service
Used only if the service is set to be started manually
Most common service types used by malware
Stores the code for the service in a DLL
Combines several different services in a single, shared process.
Stores the code in an .exe file and runs as an independent process
Used for loading code into the kernel
Information about services is stored in the registry under HKLM\SYSTEM\CurrentControlSet\Services
Used to investigate and manipulate services
Commands for adding, deleting, starting, stopping and querying services
An interface standard that makes it possible for different software components to call each other's code without knowledge of specifics about each other
Works with any programming language
Designed to support reusable software components
Implemented as a client/server framework
Each thread that uses COM has to call the OleInitialize
or CoInitializeEx
function at least once prior to calling any other COM library functions
COM objects are accessed via
Globally Unique Identifiers (GUIDs)
Class Identifiers (CLSIDs)
Interface Identifiers (IIDs)
Used to get access to COM functionality
Common function used by malware
Allows a program to launch Internet Explorer and access a web address
Interfaces are identified with a GUID called an IID, and classes are identified with a GUID called a CLSID
The OS uses information in the registry to determine which file contains the request COM code when a program call CoCreateInstance
To identify what a malicious program is doing when it calls a COM function, malware analysts have to determine which offset a function is stored at
One strategy for identifying the function called by a COM client to check the header files for the interface specified in the call to CoCreateInstance
Some COM objects are implemented as DLLs - loaded into the process space of the COM client executable
COM object is set up to be loaded as a DLL, the registry entry for the CLSID
Malware can implement a malicious COM server that can then be used by other applications
Browser Helper Objects (BHOs)
provide common COM server functionality for malware
Third-party plug-ins for Internet Explorer
No restrictions, so malware authors use them to run code running inside the IE process
This allows them to monitor Internet traffic, track browser usage, communicate with the Internet, without running their own process
Usually easy to detect because it exports several functions
Allow a program to handle events outside the flow of normal execution
Caused by errors
When they happen, execution transfers to a special routine that resolves the exception
When an exception occurs, Windows looks in fs:0
for the stack location that stores the exception information and then the exception handler is called
After the exception is handled, execution returns to the main thread
Structured Exception Handling (SEH)
Windows mechanism for handling exceptions
SEH information is stored on the stack
If the exception handle for the current frame does not handle an exception, it's passed to the exception handler for the caller's frame
If none of the exception handlers responds to an exception, the top-level exception handler crashes the application
Exception handlers can be used in exploit code to gain execution
A pointer to exception-handling information is stored on the stack
During a stack overflow, an attacker can overwrite the pointer
By specifying a new exception handler, the attacker gains execution when an exception happens
User Mode
Each process has its own memory, security permissions, and resources
When a program executes an invalid instruction and crashes, Windows can reclaim all the resources and terminate the program.
Cannot access hardware directly
Restricted to only a subset of all the registers and instructions available on the CPU
Relies on the Windows API to manipulate hardware or change the state in the kernel
Presence of SYSENTER
, INT 0x2E
instructions in disassembly indicates that a call is being made into the kernel
Kernel Mode
All processes running in the kernel share resources and memory addresses
Kernel code has fewer security checks
If the code contains invalid instructions, then the OS cannot continue running, resulting in the famous Windows BSoD
Code running in kernel can manipulate code running in user space, but code running in user space can affect the kernel only through well-defined interfaces
Most security programs (AV and Firewalls) run in kernel mode
Malware running in kernel mode can more easily interfere with security programs or bypass firewalls
OS's auditing features don't apply to the kernel
Nearly all rootkits use code running in the kernel
Only sophisticated malware runs in the kernel
Most malware has no kernel component
Lower-level interface for interacting with Windows that is rarely used by non-malicious programs
Bypasses the normal Windows API
User applications get access to user APIs like kernel32.dll
and other DLLs which call ntdll.dll
a special DLL that manages interactions between user space and the kernel
functions use APIs and structures just like the ones used in the kernel
functions make up the Native API
Programs are not supposed to call the Native API but nothing in the OS prevents them from doing so
Calling the Native API is attractive for malware because
it allows them to do things that might not otherwise be possible
Additional functionality that is not exposed in the regular Windows API
Native API calls that provide information about the system, processes, threads, handles and other items
Native API function popular with malware authors
Meant to transfer execution back to the main thread of a program after an exception has been handled
Location to return to is specified in the exception context and it can be changed
Malware often uses this function to transfer execution in complicated ways to confuse an analyst and make a program more difficult to debug
Native applications
Applications that do not use the Win32 subsystem
Issue calls to the Native API only
Rare for malware but almost nonexistent for non-malicious software, so native applications are likely malicious
Subsystem in the PE header indicates if a program is a native application