Leetcode #393: UTF-8 Validation

In this guide, we solve Leetcode #393 UTF-8 Validation in Python and focus on the core idea that makes the solution efficient.

You will see the intuition, the step-by-step method, and a clean Python implementation you can use in interviews.

Leetcode

Problem Statement

Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).

Quick Facts

Difficulty: Medium
Premium: No
Tags: Bit Manipulation, Array

Intuition

The problem structure lets us track state with bitwise operations.

Bit operations are constant time and avoid extra memory.

Approach

Apply XOR/AND/OR and shifts to maintain the required invariant.

Aggregate the result in a single pass.

Steps:

Identify a bitwise invariant.
Combine values with bit operations.
Return the aggregated result.

Example

Number of Bytes   |        UTF-8 Octet Sequence
                       |              (binary)
   --------------------+-----------------------------------------
            1          |   0xxxxxxx
            2          |   110xxxxx 10xxxxxx
            3          |   1110xxxx 10xxxxxx 10xxxxxx
            4          |   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Python Solution

class Solution:
    def validUtf8(self, data: List[int]) -> bool:
        cnt = 0
        for v in data:
            if cnt > 0:
                if v >> 6 != 0b10:
                    return False
                cnt -= 1
            elif v >> 7 == 0:
                cnt = 0
            elif v >> 5 == 0b110:
                cnt = 1
            elif v >> 4 == 0b1110:
                cnt = 2
            elif v >> 3 == 0b11110:
                cnt = 3
            else:
                return False
        return cnt == 0

Complexity

The time complexity is $O(n)$ , where $n$ is the length of the array data. The space complexity is $O(1)$ .

Edge Cases and Pitfalls

Watch for boundary values, empty inputs, and duplicate values where applicable. If the problem involves ordering or constraints, confirm the invariant is preserved at every step.

Summary

This Python solution focuses on the essential structure of the problem and keeps the implementation interview-friendly while meeting the constraints.

Example

Number of Bytes   |        UTF-8 Octet Sequence
                       |              (binary)
   --------------------+-----------------------------------------
            1          |   0xxxxxxx
            2          |   110xxxxx 10xxxxxx
            3          |   1110xxxx 10xxxxxx 10xxxxxx
            4          |   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Python Solution

class Solution:
    def validUtf8(self, data: List[int]) -> bool:
        cnt = 0
        for v in data:
            if cnt > 0:
                if v >> 6 != 0b10:
                    return False
                cnt -= 1
            elif v >> 7 == 0:
                cnt = 0
            elif v >> 5 == 0b110:
                cnt = 1
            elif v >> 4 == 0b1110:
                cnt = 2
            elif v >> 3 == 0b11110:
                cnt = 3
            else:
                return False
        return cnt == 0

Leetcode #393: UTF-8 Validation

Problem Statement

Quick Facts

Intuition

Approach

Example

Python Solution

Complexity

Edge Cases and Pitfalls

Summary

Ace your next coding interview

Leetcode #393: UTF-8 Validation

Problem Statement

Quick Facts

Intuition

Approach

Example

Python Solution

Complexity

Edge Cases and Pitfalls

Summary

Ace your next coding interview