Regex Basics: A Beginner's Guide to Regular Expressions

December 1, 2025·14 min read

regexbasicstext

Regular expressions don't have to be scary. Learn the fundamentals with practical examples.

There's a famous joke in programming circles: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

It's funny because there's truth to it. Regex can be cryptic, frustrating, and maddeningly difficult to debug. A single character can change everything. And yet, once you understand the basics, regex becomes one of the most powerful tools in your toolkit.

Let me show you how to get started without losing your mind.

What Actually Is Regex?

A regular expression (regex) is a pattern that describes a set of strings. It's a way to say "find me text that looks like this" without knowing the exact text you're looking for.

For example, the regex pattern \d{3}-\d{4} matches any string that looks like a phone number: three digits, a hyphen, four digits. It would match "555-1234", "123-4567", "999-0000" — any text that fits that pattern.

Regex is used for:

Validation: Does this string look like an email address?
Search: Find all URLs in this document
Replace: Change all phone numbers to format (XXX) XXX-XXXX
Extraction: Pull all the prices out of this webpage

Every programming language supports regex. It's one of those skills that transfers everywhere.

Your First Pattern

Let's start simple. The most basic regex is just literal text:

Pattern: hello
Text: Say hello to the world
Match: "hello"

The pattern hello matches the literal text "hello". Nothing fancy yet.

Try it in our Regex Tester: enter the pattern hello and the text "Say hello to the world". You'll see "hello" highlighted.

Character Classes: Matching Sets of Characters

What if you want to match any digit? Or any letter? That's where character classes come in.

Square brackets define a set of characters to match:

Pattern: [aeiou]
Matches: any single vowel

This matches exactly one character that's either a, e, i, o, or u.

You can also use ranges:

Pattern: [a-z]
Matches: any lowercase letter

Pattern: [A-Z]
Matches: any uppercase letter

Pattern: [0-9]
Matches: any digit

Pattern: [a-zA-Z0-9]
Matches: any letter or digit

These are so common that regex has shortcuts:

\d = [0-9]          any digit
\w = [a-zA-Z0-9_]   any "word" character
\s = [ \t\n\r]      any whitespace
.  = anything       any character (except newline)

The uppercase versions are the opposite:

\D = [^0-9]         NOT a digit
\W = [^a-zA-Z0-9_]  NOT a word character
\S = [^\s]          NOT whitespace

Quantifiers: How Many?

So far, each pattern element matches exactly one character. Quantifiers let you match multiple characters:

*     zero or more
+     one or more
?     zero or one (optional)
{3}   exactly 3
{2,5} between 2 and 5
{3,}  3 or more

Let's build something useful. A US phone number like 555-123-4567:

Pattern: \d{3}-\d{3}-\d{4}
Matches: "555-123-4567"

Breaking it down:

\d{3} — exactly 3 digits
- — a literal hyphen
\d{3} — exactly 3 more digits
- — another hyphen
\d{4} — exactly 4 digits

Now let's handle optional elements. What if the phone number might or might not have parentheses around the area code?

Pattern: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

This matches:

555-123-4567
(555) 123-4567
555.123.4567
555 123 4567

The ? after $? and $? makes the parentheses optional. The [-.\s]? matches an optional separator (hyphen, period, or space).

Anchors: Position Matters

Sometimes you need to match text at a specific position:

^   start of string (or line)
$   end of string (or line)
\b  word boundary

Without anchors:

Pattern: test
Text: "testing 123"
Matches: "test" (inside "testing")

With anchors:

Pattern: ^test
Text: "testing 123"
Matches: "test" (at the start)

Pattern: test$
Text: "this is a test"
Matches: "test" (at the end)

Pattern: \btest\b
Text: "testing a test today"
Matches: "test" (only the standalone word, not inside "testing")

Groups and Capturing

Parentheses serve two purposes in regex: grouping and capturing.

Grouping lets you apply quantifiers to multiple characters:

Pattern: (ha)+
Matches: "ha", "haha", "hahaha", etc.

Capturing lets you extract parts of the match:

Pattern: (\d{3})-(\d{4})
Text: "Call 555-1234"
Full match: "555-1234"
Group 1: "555"
Group 2: "1234"

This is incredibly useful for parsing structured data. Want to extract all the domains from a list of URLs? Use capturing groups.

Alternation: This OR That

The pipe character | means "or":

Pattern: cat|dog
Matches: "cat" or "dog"

Pattern: (cats?|dogs?)
Matches: "cat", "cats", "dog", or "dogs"

Real-World Examples

Let's look at some patterns you'll actually use.

Email Validation (Simple)

Pattern: ^\S+@\S+\.\S+$

This matches strings that:

^ — Start of string
\S+ — One or more non-whitespace characters
@ — Literal @ symbol
\S+ — One or more non-whitespace characters
\. — Literal period (escaped with )
\S+ — One or more non-whitespace characters
$ — End of string

This isn't RFC-compliant email validation (that's absurdly complex), but it catches the obvious mistakes.

URL Detection

Pattern: https?://[^\s]+

https? — "http" with an optional "s"
:// — Literal "://"
[^\s]+ — One or more non-whitespace characters

Extracting Hashtags

Pattern: #\w+

# — Literal hash
\w+ — One or more word characters

This matches "#coding", "#100DaysOfCode", etc.

Matching Dates

Pattern: \d{1,2}/\d{1,2}/\d{2,4}

Matches dates like "1/1/24", "12/31/2024", "3/15/99".

Validating Hex Colors

Pattern: ^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Matches "#FFF" or "#FFFFFF" format hex colors.

Common Mistakes and Gotchas

Forgetting to Escape Special Characters

These characters have special meaning in regex: . * + ? ^ $ { } [ ] \ | ( )

If you want to match a literal period, you need to escape it: \.

Pattern: example.com     (WRONG - matches "exampleXcom")
Pattern: example\.com    (RIGHT - matches "example.com")

Greedy vs Lazy Matching

By default, quantifiers are "greedy": they match as much as possible:

Pattern: <.*>
Text: "<b>bold</b>"
Matches: "<b>bold</b>" (the entire string!)

The .* gobbles up everything between the first < and the LAST >. If you want the shortest match, add ? to make it "lazy":

Pattern: <.*?>
Text: "<b>bold</b>"
Matches: "<b>" and "</b>" (separately)

Forgetting Anchors in Validation

Pattern: \d{5}
Text: "My zip is 123456789"
Matches: "12345" (but also matches in longer strings!)

If you're validating that a string IS a 5-digit zip code:

Pattern: ^\d{5}$

Now it only matches if the ENTIRE string is exactly 5 digits.

Practice, Practice, Practice

Regex is a skill that improves with practice. Use our Regex Tester to experiment:

Write a pattern
See matches highlighted in real-time
Test against multiple strings
Debug why something isn't matching

Start with simple patterns and build up complexity. When you're stuck, break the problem down: what are you trying to match, exactly?

Quick Reference

Character Classes:

.     any character
\d    digit [0-9]
\w    word char [a-zA-Z0-9_]
\s    whitespace
[abc] a, b, or c
[^abc] NOT a, b, or c

Quantifiers:

*     0 or more
+     1 or more
?     0 or 1
{n}   exactly n
{n,m} between n and m

Anchors:

^     start of string
$     end of string
\b    word boundary

Groups:

(...)   capturing group
(?:...) non-capturing group
|       alternation (or)

Beyond the Basics

This guide covers the fundamentals, but regex goes much deeper. Once you're comfortable with these concepts, you can explore:

Lookahead and lookbehind assertions
Backreferences
Named capture groups
Regex flags (case-insensitive, multiline, etc.)
Unicode support

But don't worry about those yet. Master the basics first. You'll be surprised how much you can accomplish with just the patterns in this guide.

Now go practice. Our Regex Tester is waiting.

Tools mentioned in this article:

Regex Tester. Test and debug regular expressions

Related Tools

Regex Tester

Test regex patterns