📝
Home
Pentesting
  • 📝Home
  • ⚒️PENTESTING
    • Foundational
      • Gaining Access
      • Session Hijacking
      • Buffer Overflows
        • Finding the Offset
        • Spiking
        • Fuzzing
      • Attack Basics
        • Brute Force Attacks
        • Credential Stuffing and Password Spraying
        • Netcat Shell Stabilization
        • Reverse Shells vs Bind Shells
        • Staged vs Non-Staged Payloads
      • Footprinting
    • Reconnaissance
      • Discovering Email Addresses
      • Hunting Subdomains
    • Scanning and Enumeration
      • Banner Grabbing
      • Enumerating HTTP and HTTPS
      • Enumerating SMB
      • Enumerating SSH
      • NetBIOS Enumeration
      • SNMP Enumeration
      • Sniffing
    • Privilege Escalation
      • 🐧Linux Privilege Escalation
      • 🪟Windows Privilege Escalation
        • Initial Enumeration
    • Defense Evasion
      • Hiding Files and Covering Tracks
      • Network Evasion
    • Attacking Services
      • Attacking Kerberos
      • Attacking VPNs
      • Denial of Service
      • Exploiting FTP
      • Exploiting NFS
      • Exploiting SMTP
      • Exploiting Telnet
    • Attacking Active Directory
      • Initial Attack Vectors
        • Gaining Shell Access
        • LLMNR Poisoning
        • SMB Relay
        • Passback Attacks
        • IPv6 Attacks
      • Post-Compromise Enumeration
        • Bloodhound
        • ldapdomaindump
        • PowerView
        • PlumHound
      • Post-Compromise Attacks
        • GPP Attacks
        • Print Nightmare
        • Token Impersonation using Incognito
        • URL File Attack
        • Pass Attacks
        • Kerberoasting
        • LNK File Attacks
        • Mimikatz
      • Post-Domain Compromise Attacks
        • Dumping the NTDS.dit
        • Golden Ticket Attacks
      • Post Exploitation
    • Toolkit
      • Burp Suite
        • Intruder
      • Hping
        • Crafting TCP and UDP Packets
      • Metasploit
        • Meterpreter
        • Shell Handler
        • Gather Information
        • Gaining Root
    • Web Application Hacking
      • Attack Methodology
      • Attacking Web Applications
      • Authentication Bypass
      • Cross-Site Scripting
      • Cross-Site Request Forgery
      • File Inclusion
      • Server-Side Request Forgery
      • Injection
        • Command Injection
        • LDAP Injection
        • SQL Injection
  • 👽MALWARE ANALYSIS
    • Malware Analysis Primer
    • Malware Types
      • Rootkits
      • Viruses
      • WannaCry
    • Analyzing Malicious Windows Programs
    • Static Analysis
      • Basic Static Techniques
      • Advanced Static Analysis
    • Reverse Engineering
      • Crash Course in x86 Disassembly
      • Recognizing Code in Assembly Language
    • Dynamic Analysis
    • Detecting Malware
      • Evasion Techniques
      • Detecting Mimikatz
      • Hunting Malware
      • Hunting Metasploit
      • Hunting Persistence
  • 🏹THREAT HUNTING
    • Foundational
      • ATT&CK Framework
      • CIA Triad
    • APTs
  • 🐍PROGRAMMING & SCRIPTING
    • Foundational
      • Computer Memory
    • C Programming
    • Assembly Language
      • Assembly File Structure
      • Debugging with gdb
    • Bash
    • Python
      • Foundational
        • Booleans and Operators
        • Comprehensions
        • Conditionals
        • Dictionaries
        • Exceptions and Error Handling
        • Functions
        • Lambdas
        • Lists
        • Loops
        • Modules
        • Numbers
        • Reading and Writing Files
        • Sets
        • Sockets
        • String Formatting
        • Tuples
        • User Input
        • Variables
      • Extending Python
        • Virtual Environments
        • Sys Module
        • Requests
        • pwntools
    • Regular Expressions
    • SQL
  • 🕵️DIGITAL FORENSICS
    • Anti-Forensic Techniques
    • 🪟Windows Security Internals
      • Windows Security Internals
        • Kernel
          • Security Reference Monitor (SRM)
          • Object Manager
            • System Calls
            • NTSTATUS Codes
            • Object Handles
            • Query and Set Information System Calls
          • The I/O Manager & The Process and Thread Manager
          • The Memory Manager
          • The Configuration Manager
  • 💼GRC (CISSP Notes)
    • Security Assessment and Testing
    • Security Governance Principles
    • Security Policies Standards and Procedures
    • Preventing and Responding to Incidents
    • Organizational Roles and Responsibilities
    • Organizational Processes
  • 📦Networking
    • Foundational
      • DHCP
      • DNS Basics
      • HTTP Protocol
      • IPSec
      • IPv6 Fundamentals
    • Wireless Technologies
      • 802.11
      • Bluetooth
      • Wireless Authentication
      • Wireless Encryption
Powered by GitBook
On this page
  • Reverse Engineering
  • The x86 Architecture
  • Main memory
  • Instructions
  • Opcodes and Endianness
  • Operands
  • Registers
  • Flags
  • EIP, the Instruction Pointer
  • Simple Instructions
  • Arithmetic
  • The Stack
  • Function Calls
  • Branch
  • Rep Instruction
  1. MALWARE ANALYSIS
  2. Reverse Engineering

Crash Course in x86 Disassembly

PreviousReverse EngineeringNextRecognizing Code in Assembly Language

Last updated 1 year ago

  • Levels of abstraction - create a way of hiding the implementation details

The lower the level = the less portable across computer systems

  • Hardware

    • Only physical level

    • Consists of electrical circuits that use complex combinations of logical operators (XOR, AND, OR and NOT gates)

  • Microcode

    • Also known as firmware

    • Operates only on the exact circuitry for which it was designed

    • Contains instructions that translate from higher machine-code level to interface with the hardware

  • Machine code

    • Consists of opcodes, hexadecimal digits that tell the processor what you want it to do

    • Implemented with several microcode instructions so that the hardware can execute the code

    • Created when programs written in a high-level language is compiled

  • Low-level languages

    • Human-readable version of a computer architecture's instruction set

    • Most common low-level language is assembly language

    • Use a disassembler to generate low-level language text

  • High-level languages

    • Provide strong abstraction from the machine level and make it easy to use programming logic and flow-control mechanisms

    • Includes: C, C++

    • Languages are typically turned into machine code by a compiler through the process of compilation

  • Interpreted languages

    • At the top level

    • Includes: C#, Perl, .NET and Java

    • Code at this level is not compiled into machine code but is instead translated into bytecode

    • Bytecode

      • is an intermediate representation that is specific to the programming language

        • Executes within an interpreter

        • Interpreter

          • Is a program hat translates bytecode into executable machine code on the fly at runtime

          • Provides an automatic level of abstraction when compared to traditional compiled code

          • Can handle errors and memory management

Reverse Engineering

  • Assembly Language

    • Actually a class of languages

    • Each dialect is typically used to program a single family of microprocessors:

      • x86

      • x64

      • SPARC

      • PowerPC

      • MIPS

      • ARM

    • Most malware is compiled for x86

The x86 Architecture

  • Three hardware components

    • Central Processing Unit (CPU) executes code

    • Main Memory (RAM) stores all data and code

    • Input / Output system (I/O) interfaces with devices such as hard drives, keyboards and monitors

  • Control Unit gets instructions to execute from RAM using a register (the instruction pointer), which stores the address of the instruction to execute

  • Registers

    • CPU's basic data storage units

    • Used to save time so that the CPU doesn't need to access RAM

  • Arithmetic logic unit (ALU)

    • Executes an instruction fetched from RAM

    • Places the results in registers or memory

Main memory

  • Divided into the following major sections:

    • Data

      • Used to refer to a specific section of memory called the data section

      • Contains values that are put in place when a program is initially loaded

      • Static values do not change while the program is running

      • Global values are available to any part of the program

    • Code

      • Includes the instructions fetched by the CPU to execute the program's tasks

      • Controls what the program does and how the program's tasks will be orchestrated

    • Heap

      • Used for dynamic memory during program execution, to create new values and eliminate values that the program no longer needs

      • Dynamic memory - contents can change frequently while the program is running

    • Stack

      • Used for local variables and parameters for functions

      • Help control program flow

Instructions

  • Are the building blocks of assembly programs

  • In x86 assembly, instructions are made of a mnemonic and zero or more operands

    • mnemonic - a word that identifies the instruction to execute, such as mov (moves data)

    • operands - used to identify information used by the instruction, such as registers or data

Opcodes and Endianness

  • Opcodes

    • Tell the CPU which operation the program wants to perform

    • Disassemblers translate opcodes into human-readable instructions

  • Endianness

    • Describes whether the most significant or least significant byte is ordered first within a larger data item

    • Changing between endianness is something malware has to do during network communication because network data uses big-endian and an x86 program uses little-endian

    • Need to be aware of this to make sure you don't accidentally reverse the order of important indicators like an IP address

Operands

  • Used to identify the data used by an instruction

  • Three types:

    • Immediate - fixed values

    • Register - operands refer to registers

    • Memory address - refer to a memory address that contains the value of interest, typically denoted by a value, register, or equation between brackets.

Registers

  • A small amount of data storage available to the CPU

  • Contents can be accessed more quickly than storage available elsewhere

  • x86 processors have a collection of registers available for use as temporary storage or workspace

  • Four categories:

    • General registers - used by the CPU during execution

    • Segment registers - used to track sections of memory

    • Status flags - used to make decisions

    • Instruction pointers - used to keep track of the next instruction to execute

  • General registers are 32 bits in size

  • Can be referenced as either 32 or 16 bits in assembly code

  • General Registers

    • Store data or memory addresses

    • Used interchangeably to get things accomplished within the program

    • Used in a consistent fashion throughout a program

    • Example - EAX register generally contains the return value for function calls

Flags

  • EFLAGS register

    • A status register

    • During execution - each flag is either set (1) or cleared (0) to control CPU operations or indicate the results of a CPU operation

  • Most important flags to malware analysis

    • ZF - set when the result of an operation is equal to zero otherwise it is cleared

    • CF - set when the result of an operation is too large or too small for the destination operand or it is cleared

    • SF - set when the result of an operation is negative or cleared when the result is positive. Also set when the most significant bit is set after an arithmetic operation

    • TF - used for debugging. The x86 processor will execute only one instruction at a time if this flag is set

EIP, the Instruction Pointer

  • Also known as the instruction pointer or program counter

  • A register that contains the memory address of the next instruction to be executed for a program

  • Only purpose is to tell the processor what to do next

  • Corrupted EIP - leads to a program crash because it points to a memory address that does not contain legitimate program code

  • Attackers want to control EIP because it lets them control what is executed by the CPU

Simple Instructions

  • mov

    • used to move data from one location to another

    • reads and writes to memory

    • format - mov destination, source

  • lea

    • "load effective address"

    • format - lea destination, source

    • used to put memory address into the destination

    • not used to exclusively to refer to memory addresses

    • useful when calculating values because it needs fewer instructions

Arithmetic

  • add destination, value

  • sub destination, value

    • Zero flag (ZF) is set if the result is zero

    • Carry Flag (CF) is set if the destination is less than the value subtracted

  • Multiplication and division

    • both act on a predefined register

    • Format - mul value and div value

    • Result is stored as 64-bit value across two registers

  • Shift registers

    • shift the bits in the destination operand to the right and left

    • shr destination, count

  • NOP

    • Does nothing, execution just moves to the next instruction

    • opcode is 0x90

    • Commonly used in a NOP sled for buffer overflow attacks

    • Provides execution padding - reduces the risk that the malicious shellcode will start executing in the middle

The Stack

  • Stores memory for functions, local variables and flow control

  • Is a data structure characterized by pushing and popping

  • Last in, first out structure

  • Short term storage only

  • Primary usage is for the management of data exchanged between function calls

  • Stack instructions - push, pop, call, leave, enter and ret

  • ESP

    • The stack pointer

    • Contains a memory address that points to the top of the stack

  • EBP

    • The base pointer

    • Stays consistent within a given function

    • The program can use it as a placeholder to keep track of the location of local variables and parameters

Function Calls

  • Functions

    • Portions of code within a program that perform a specific task

    • Relatively independent of the remaining code

  • Prologue - prepares the stack and registers for use within the function

  • Epilogue - restores the stack and registers to their state before the function was called

  • Flow of function call implementation

    1. Arguments are placed on the stack using push instructions

    2. Function is called using call memory_location, this causes the current instruction address (the contents of the EIP register) to be pushed onto the stack.

      • This address is used to return to the main code when the function is finished.

      • When the function begins, EIP is set to memory_location (the start of the function)

    3. Using the prologue, space is allocated on the stack for local variables and EBP (base pointer) is pushed onto the stack

    4. The function performs its work

    5. Using the function epilogue, the stack is restored. ESP is adjusted to free the local variables and EBP is restored so that the calling function can address its variables properly

    6. Functions returns by calling the ret instruction. The program will continue executing from where the original call was made

    7. Stack is adjusted to remove arguments

Branch

  • A sequence of code that is conditionally executed depending on the flow of the program

  • jump instructions

    • Most popular way branching happens

    • jmp location - causes the next instruction executed to be the one specified by the jmp

  • Conditional Jumps

    • Use the flags to determine whether to jump or to proceed to the next instruction

Rep Instruction

  • A set of instruction for manipulating data buffers

  • Usually in the form of an array of bytes

  • ESI - Source Index Register

  • EDI - Destination Index Register

  • ECX - Counting Variable

👽