Question

String to byte[] in C#: Why Encoding Matters in .NET

csharp.netstringcharacter-encoding

Question

In C#, how can a string be converted to a byte[] in .NET without manually choosing an encoding?

I am planning to encrypt the string. Although encryption APIs may sometimes work with text directly, I want to understand why encoding is involved when converting text to bytes.

Why does encoding matter here at all? Can't I just retrieve the raw bytes that the string is already stored as internally? Why is converting a string to bytes dependent on character encoding?

Short Answer

By the end of this page, you will understand that a C# string is text, not an arbitrary byte sequence, and converting it to byte[] always requires an encoding such as UTF-8 or UTF-16. You will learn how .NET stores strings internally, why there is no single universal byte representation for text, how to convert strings to bytes consistently, and what to use when preparing text for encryption, storage, or network transfer.

Concept

A string in C# represents characters, not raw bytes.

When you want a byte[], .NET must answer a question:

Which bytes should represent those characters?

That question is exactly what character encoding solves.

Why encoding is required

The same text can have different byte representations depending on the encoding.

For example, the text Hello could be represented as:

UTF-8
UTF-16
ASCII
UTF-32

Each encoding turns characters into bytes differently.

Example

The character A might be:

UTF-8: 41
UTF-16 little-endian: 41 00
UTF-32 little-endian: 41 00 00 00

So when you ask for bytes, .NET cannot guess which representation you want unless you specify one.

How .NET stores strings internally

In .NET, a string is stored in memory as a sequence of UTF-16 code units. That is an implementation detail of how the runtime keeps text in memory.

But that internal representation is as a portable text encoding choice for:

Mental Model

Think of a string as a sentence written on a whiteboard.

The sentence is the meaning: the characters and words.

An encoding is the rulebook for copying that sentence into a sequence of numbered boxes.

Different rulebooks produce different box contents:

UTF-8 uses one set of rules
UTF-16 uses another
ASCII uses a limited older set

The whiteboard text is not "already a file format." It is just text in memory.

So asking for the bytes of a string without an encoding is like asking:

"Write this sentence into boxes, but don't tell me which writing system to use."

The runtime needs that rulebook.

Another useful mental model:

string = human-readable text
byte[] = raw machine data
encoding = translator between the two

Without the translator, the conversion is ambiguous.

Take Quiz

Syntax and Examples

Core syntax

using System.Text;

string text = "Hello";
byte[] bytes = Encoding.UTF8.GetBytes(text);
string decoded = Encoding.UTF8.GetString(bytes);

Example: converting text to bytes

using System;
using System.Text;

class Program
{
    static void Main()
    {
        string text = "Hello, café";

        byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);
        byte[] unicodeBytes = Encoding.Unicode.GetBytes(text); // UTF-16 LE in .NET

        Console.WriteLine("UTF-8 bytes:    " + BitConverter.ToString(utf8Bytes));
        Console.WriteLine("UTF-16 bytes:   " + BitConverter.ToString(unicodeBytes));
    }
}

This prints different byte sequences for the same string because different encodings are being used.

Example: preparing text for encryption

using System.Text;

string plaintext = "Secret message";
byte[] plaintextBytes = Encoding.UTF8.GetBytes(plaintext);

// Encrypt plaintextBytes here

Step by Step Execution

Consider this code:

using System;
using System.Text;

string text = "Aé";
byte[] bytes = Encoding.UTF8.GetBytes(text);
Console.WriteLine(BitConverter.ToString(bytes));

Step by step

1. `string text = "Aé";`

A string containing two characters is created:

A
é

These are characters, not output bytes yet.

2. `Encoding.UTF8.GetBytes(text);`

.NET converts each character into UTF-8 bytes.

A becomes 41
é becomes C3-A9

So the final byte array becomes:

41-C3-A9

3. `BitConverter.ToString(bytes)`

This formats the byte array as a readable hexadecimal string.

Real World Use Cases

Encoding-aware string-to-byte conversion is used everywhere in real programs.

1. Encrypting user input

Before encrypting a password hint, note, or token payload, text is converted to bytes:

byte[] data = Encoding.UTF8.GetBytes(userMessage);

2. Writing text to files

A file is just bytes. If you save text, you must choose how characters become bytes.

File.WriteAllText("notes.txt", text, Encoding.UTF8);

3. Sending JSON over HTTP

HTTP bodies are bytes, even if the content is text.

var content = new StringContent(json, Encoding.UTF8, "application/json");

4. Storing text in databases or caches

Data may move across systems with different defaults. Explicit encoding avoids corruption.

5. Hashing text consistently

If you hash text, the bytes must be consistent or the hash changes.

byte[] input = Encoding.UTF8.GetBytes(text);

Different encodings produce different hashes for the same visible text.

Take Quiz

Real Codebase Usage

In real projects, developers usually do not avoid choosing an encoding. Instead, they make the choice explicit and consistent.

Common patterns

Use UTF-8 as the default text encoding

UTF-8 is the most common choice for APIs, config files, JSON, logs, and cross-platform systems.

byte[] bytes = Encoding.UTF8.GetBytes(value);

Keep encode/decode paired

If one part writes with UTF-8, the reading part should also use UTF-8.

byte[] bytes = Encoding.UTF8.GetBytes(text);
string textAgain = Encoding.UTF8.GetString(bytes);

Validate input before conversion

Guard clauses are common when handling nullable or empty values.

if (text == null)
    throw new ArgumentNullException(nameof(text));

byte[] bytes = Encoding.UTF8.GetBytes(text);

Centralize encoding decisions

Larger codebases often define a shared encoding policy.

public static class TextEncoding
{
       Encoding Default = Encoding.UTF8;
}

Common Mistakes

1. Assuming a string already has one obvious byte representation

A string is text, not a fixed byte array.

Mistake

// Conceptually wrong: there is no encoding-free text-to-bytes conversion

Fix

Choose an encoding explicitly:

byte[] bytes = Encoding.UTF8.GetBytes(text);

2. Encoding with UTF-8 and decoding with UTF-16

Using different encodings in each direction corrupts data.

Broken code

byte[] bytes = Encoding.UTF8.GetBytes("café");
string text = Encoding.Unicode.GetString(bytes);

Fix

Use the same encoding both ways:

byte[] bytes = Encoding.UTF8.GetBytes("café");
string text = Encoding.UTF8.GetString(bytes);

3. Using ASCII for non-ASCII text

ASCII cannot represent many characters like é, 中, or emoji.

Broken code

Comparisons

Common encoding choices in C#

Encoding	Bytes for `A`	Supports Unicode well?	Common use	Notes
ASCII	`41`	No	Legacy text	Only supports 0–127 reliably
UTF-8	`41`	Yes	Web, APIs, files, JSON	Most common modern choice
UTF-16 (`Encoding.Unicode`)	`41-00`	Yes	In-memory .NET string model, some Windows APIs	Uses 2 or more bytes per character conceptually
UTF-32

Cheat Sheet

Quick reference

Convert string to bytes

byte[] bytes = Encoding.UTF8.GetBytes(text);

Convert bytes to string

string text = Encoding.UTF8.GetString(bytes);

Get UTF-16 bytes

byte[] bytes = Encoding.Unicode.GetBytes(text);

Convert encrypted bytes to safe string form

string base64 = Convert.ToBase64String(encryptedBytes);

Convert Base64 string back to bytes

byte[] bytes = Convert.FromBase64String(base64);

Rules to remember

A C# string is text, not raw binary data.
Converting text to bytes always needs an encoding.
Use the same encoding for encoding and decoding.
UTF-8 is usually the best default.
Do not treat arbitrary binary data as text.
Use Base64 when binary data must be stored in a string.

Common choices

Encoding.UTF8 -> best default for most applications

FAQ

Why can't I convert a C# string to bytes without an encoding?

Because a string represents characters, not a unique byte sequence. An encoding defines how those characters are turned into bytes.

Does .NET store strings internally as UTF-16?

Yes, .NET strings use UTF-16 code units internally. But that internal storage format is not the same as choosing a portable external byte format.

Should I use UTF-8 or UTF-16 in C#?

Usually use UTF-8 for files, APIs, encryption input, and network communication. Use UTF-16 only when a specific API or format requires it.

If .NET uses UTF-16 internally, can I just use `Encoding.Unicode`?

You can, but that is still an explicit encoding choice. It is not the same as avoiding encoding. It is often less portable than UTF-8.

Why does encryption require bytes instead of strings?

Encryption algorithms operate on binary data. If your original data is text, you must first encode it into bytes.

Can I store encrypted bytes in a string?

Not safely as regular text. Use Base64 to convert encrypted bytes into a string-safe representation.

Will the same text always produce the same bytes?

Only if you use the same encoding. Different encodings produce different byte sequences for the same text.

Related Concepts

Unicode — Explains how characters are defined independently from byte storage.
UTF-8 — The most common encoding for converting text to bytes.
UTF-16 — Relevant because .NET strings use UTF-16 code units internally.
ASCII — Useful for understanding older limited encodings and why they fail for many characters.
Base64 — Common for representing binary data such as encrypted bytes as text.
String immutability in C# — Helps explain what a string is as a runtime object.
Serialization — Related because text and objects must often be converted into transferable byte formats.
Hashing in C# — Like encryption, hashing works on bytes, so encoding matters first.

Take Quiz

Mini Project

Description

Build a small C# console program that shows how the same string becomes different byte arrays depending on the encoding. This project helps reinforce that text and bytes are not the same thing, which is especially important before hashing, encryption, file writing, or sending data over a network.

Goal

Create a program that converts one string into UTF-8 and UTF-16 byte arrays, prints both results, and decodes the bytes back into the original text.

Requirements

Create a string containing at least one non-ASCII character such as é or 中.
Convert the string to bytes using both Encoding.UTF8 and Encoding.Unicode.
Print the byte arrays in hexadecimal form.
Decode both byte arrays back into strings using the matching encoding.
Show that the original text is preserved when the correct encoding is used.

Take Quiz

Keep learning

Option	Meaning	When to use
`Encoding.UTF8`	Variable-length Unicode encoding	Usually best for storage, APIs, encryption input, files
`Encoding.Unicode`	UTF-16 little-endian in .NET	When a system specifically expects UTF-16

Idea	What it means
Internal representation	How .NET keeps the string in memory
Encoding	How text is converted to bytes for storage or transfer

String to byte[] in C#: Why Encoding Matters in .NET

Question

Short Answer

Concept

Why encoding is required

Example

How .NET stores strings internally

Mental Model

Syntax and Examples

Core syntax

Example: converting text to bytes

Example: preparing text for encryption

Step by Step Execution

Step by step

1. string text = "Aé";

2. Encoding.UTF8.GetBytes(text);

3. BitConverter.ToString(bytes)

Real World Use Cases

1. Encrypting user input

2. Writing text to files

3. Sending JSON over HTTP

4. Storing text in databases or caches

5. Hashing text consistently

Real Codebase Usage

Common patterns

Use UTF-8 as the default text encoding

Keep encode/decode paired

Validate input before conversion

Centralize encoding decisions

Common Mistakes

1. Assuming a string already has one obvious byte representation

Mistake

Fix

2. Encoding with UTF-8 and decoding with UTF-16

Broken code

Fix

3. Using ASCII for non-ASCII text

Broken code

Comparisons

Common encoding choices in C#

Cheat Sheet

Quick reference

Convert string to bytes

Convert bytes to string

Get UTF-16 bytes

Convert encrypted bytes to safe string form

Convert Base64 string back to bytes

Rules to remember

Common choices

FAQ

Why can't I convert a C# string to bytes without an encoding?

Does .NET store strings internally as UTF-16?

Should I use UTF-8 or UTF-16 in C#?

If .NET uses UTF-16 internally, can I just use Encoding.Unicode?

Why does encryption require bytes instead of strings?

Can I store encrypted bytes in a string?

Will the same text always produce the same bytes?

Related Concepts

Mini Project

Description

Goal

Requirements

Related questions

AddTransient vs AddScoped vs AddSingleton in ASP.NET Core Dependency Injection

C# Type Checking Explained: typeof vs GetType() vs is

C# Version Numbers Explained: C# vs .NET Framework and Why “C# 3.5” Is Incorrect

Why you cannot just "get the stored bytes"

Best practice

Important idea for encryption

Example: converting bytes back to text

If you really want the in-memory UTF-16 bytes

Compare with UTF-16

Be explicit at boundaries

Fix

4. Relying on platform defaults

Better approach

5. Treating encrypted bytes as text directly

Broken idea

Fix

6. Confusing internal storage with external representation

1. `string text = "Aé";`

2. `Encoding.UTF8.GetBytes(text);`

3. `BitConverter.ToString(bytes)`

If .NET uses UTF-16 internally, can I just use `Encoding.Unicode`?

`Encoding.UTF8` vs `Encoding.Unicode`