Regex Basics: A Beginner's Guide to Regular Expressions
Regular expressions don't have to be scary. Learn the fundamentals with practical examples.
There's a famous joke in programming circles: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
It's funny because there's truth to it. Regex can be cryptic, frustrating, and maddeningly difficult to debug. A single character can change everything. And yet, once you understand the basics, regex becomes one of the most powerful tools in your toolkit.
Let me show you how to get started without losing your mind.
What Actually Is Regex?
A regular expression (regex) is a pattern that describes a set of strings. It's a way to say "find me text that looks like this" without knowing the exact text you're looking for.
For example, the regex pattern \d{3}-\d{4} matches any string that looks like a phone number: three digits, a hyphen, four digits. It would match "555-1234", "123-4567", "999-0000" — any text that fits that pattern.
Regex is used for:
- Validation: Does this string look like an email address?
- Search: Find all URLs in this document
- Replace: Change all phone numbers to format (XXX) XXX-XXXX
- Extraction: Pull all the prices out of this webpage
Every programming language supports regex. It's one of those skills that transfers everywhere.
Your First Pattern
Let's start simple. The most basic regex is just literal text:
Pattern: hello
Text: Say hello to the world
Match: "hello"
The pattern hello matches the literal text "hello". Nothing fancy yet.
Try it in our Regex Tester: enter the pattern hello and the text "Say hello to the world". You'll see "hello" highlighted.
Character Classes: Matching Sets of Characters
What if you want to match any digit? Or any letter? That's where character classes come in.
Square brackets define a set of characters to match:
Pattern: [aeiou]
Matches: any single vowel
This matches exactly one character that's either a, e, i, o, or u.
You can also use ranges:
Pattern: [a-z]
Matches: any lowercase letter
Pattern: [A-Z]
Matches: any uppercase letter
Pattern: [0-9]
Matches: any digit
Pattern: [a-zA-Z0-9]
Matches: any letter or digit
These are so common that regex has shortcuts:
\d = [0-9] any digit
\w = [a-zA-Z0-9_] any "word" character
\s = [ \t\n\r] any whitespace
. = anything any character (except newline)
The uppercase versions are the opposite:
\D = [^0-9] NOT a digit
\W = [^a-zA-Z0-9_] NOT a word character
\S = [^\s] NOT whitespace
Quantifiers: How Many?
So far, each pattern element matches exactly one character. Quantifiers let you match multiple characters:
* zero or more
+ one or more
? zero or one (optional)
{3} exactly 3
{2,5} between 2 and 5
{3,} 3 or more
Let's build something useful. A US phone number like 555-123-4567:
Pattern: \d{3}-\d{3}-\d{4}
Matches: "555-123-4567"
Breaking it down:
\d{3}— exactly 3 digits-— a literal hyphen\d{3}— exactly 3 more digits-— another hyphen\d{4}— exactly 4 digits
Now let's handle optional elements. What if the phone number might or might not have parentheses around the area code?
Pattern: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
This matches:
555-123-4567(555) 123-4567555.123.4567555 123 4567
The ? after \(? and \)? makes the parentheses optional. The [-.\s]? matches an optional separator (hyphen, period, or space).
Anchors: Position Matters
Sometimes you need to match text at a specific position:
^ start of string (or line)
$ end of string (or line)
\b word boundary
Without anchors:
Pattern: test
Text: "testing 123"
Matches: "test" (inside "testing")
With anchors:
Pattern: ^test
Text: "testing 123"
Matches: "test" (at the start)
Pattern: test$
Text: "this is a test"
Matches: "test" (at the end)
Pattern: \btest\b
Text: "testing a test today"
Matches: "test" (only the standalone word, not inside "testing")
Groups and Capturing
Parentheses serve two purposes in regex: grouping and capturing.
Grouping lets you apply quantifiers to multiple characters:
Pattern: (ha)+
Matches: "ha", "haha", "hahaha", etc.
Capturing lets you extract parts of the match:
Pattern: (\d{3})-(\d{4})
Text: "Call 555-1234"
Full match: "555-1234"
Group 1: "555"
Group 2: "1234"
This is incredibly useful for parsing structured data. Want to extract all the domains from a list of URLs? Use capturing groups.
Alternation: This OR That
The pipe character | means "or":
Pattern: cat|dog
Matches: "cat" or "dog"
Pattern: (cats?|dogs?)
Matches: "cat", "cats", "dog", or "dogs"
Real-World Examples
Let's look at some patterns you'll actually use.
Email Validation (Simple)
Pattern: ^\S+@\S+\.\S+$
This matches strings that:
^— Start of string\S+— One or more non-whitespace characters@— Literal @ symbol\S+— One or more non-whitespace characters\.— Literal period (escaped with )\S+— One or more non-whitespace characters$— End of string
This isn't RFC-compliant email validation (that's absurdly complex), but it catches the obvious mistakes.
URL Detection
Pattern: https?://[^\s]+
https?— "http" with an optional "s"://— Literal "://"[^\s]+— One or more non-whitespace characters
Extracting Hashtags
Pattern: #\w+
#— Literal hash\w+— One or more word characters
This matches "#coding", "#100DaysOfCode", etc.
Matching Dates
Pattern: \d{1,2}/\d{1,2}/\d{2,4}
Matches dates like "1/1/24", "12/31/2024", "3/15/99".
Validating Hex Colors
Pattern: ^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Matches "#FFF" or "#FFFFFF" format hex colors.
Common Mistakes and Gotchas
Forgetting to Escape Special Characters
These characters have special meaning in regex: . * + ? ^ $ { } [ ] \ | ( )
If you want to match a literal period, you need to escape it: \.
Pattern: example.com (WRONG - matches "exampleXcom")
Pattern: example\.com (RIGHT - matches "example.com")
Greedy vs Lazy Matching
By default, quantifiers are "greedy": they match as much as possible:
Pattern: <.*>
Text: "<b>bold</b>"
Matches: "<b>bold</b>" (the entire string!)
The .* gobbles up everything between the first < and the LAST >. If you want the shortest match, add ? to make it "lazy":
Pattern: <.*?>
Text: "<b>bold</b>"
Matches: "<b>" and "</b>" (separately)
Forgetting Anchors in Validation
Pattern: \d{5}
Text: "My zip is 123456789"
Matches: "12345" (but also matches in longer strings!)
If you're validating that a string IS a 5-digit zip code:
Pattern: ^\d{5}$
Now it only matches if the ENTIRE string is exactly 5 digits.
Practice, Practice, Practice
Regex is a skill that improves with practice. Use our Regex Tester to experiment:
- Write a pattern
- See matches highlighted in real-time
- Test against multiple strings
- Debug why something isn't matching
Start with simple patterns and build up complexity. When you're stuck, break the problem down: what are you trying to match, exactly?
Quick Reference
Character Classes:
. any character
\d digit [0-9]
\w word char [a-zA-Z0-9_]
\s whitespace
[abc] a, b, or c
[^abc] NOT a, b, or c
Quantifiers:
* 0 or more
+ 1 or more
? 0 or 1
{n} exactly n
{n,m} between n and m
Anchors:
^ start of string
$ end of string
\b word boundary
Groups:
(...) capturing group
(?:...) non-capturing group
| alternation (or)
Beyond the Basics
This guide covers the fundamentals, but regex goes much deeper. Once you're comfortable with these concepts, you can explore:
- Lookahead and lookbehind assertions
- Backreferences
- Named capture groups
- Regex flags (case-insensitive, multiline, etc.)
- Unicode support
But don't worry about those yet. Master the basics first. You'll be surprised how much you can accomplish with just the patterns in this guide.
Now go practice. Our Regex Tester is waiting.
Tools mentioned in this article:
- Regex Tester. Test and debug regular expressions
Related Tools
More Articles
How to Detect AI-Written Content in 2026
AI-generated content is everywhere. Here's how to spot it, why it matters, and what tools actually work for detecting text from ChatGPT, Claude, and other AI writers.
Building With AI Code Generators: What Actually Works
AI coding tools let non-developers build real software now. Here's what I've learned about making it actually work in production.
Why Every Developer Needs a JSON Formatter Bookmarked
JSON is everywhere in modern development. Here's why having a good formatter within reach saves time and prevents headaches.

