C语言实现字符串重复字符查找与输出：从基础到高效优化346

在C语言中处理字符串是日常编程中的一项基本任务，而“输出重复字母”这个需求，看似简单，实则可以引申出多种实现方式和优化策略。作为一名专业的程序员，我们将深入探讨如何在C语言中高效、灵活地实现这一功能，从最直观的暴力解法，到利用数据结构优化的方案，再到对不同场景的适应性调整。

在日常的文本处理和数据分析中，我们经常会遇到需要找出字符串中重复出现的字符的需求。例如，在用户输入校验、密码强度检测、数据去重或特定的算法竞赛问题中，准确且高效地定位并输出字符串中的重复字母是至关重要的一步。C语言以其底层特性和对内存的直接操作能力，为我们提供了实现这些功能的高效途径。本文将深入探讨在C语言环境下，如何从基础出发，逐步优化，实现字符串重复字符的查找与输出。

“重复字母”这个概念本身就具有一定的歧义，它可能指：
找出所有重复出现的字符，并每个字符只输出一次（例如 "banana" -> 'a', 'n'）。
找出所有重复出现的字符，并按其在原字符串中首次出现的顺序输出（例如 "abracadabra" -> 'a', 'b', 'r'）。
找出第一个重复出现的字符（例如 "abracadabra" -> 'a'）。
输出字符串中所有重复出现的字符（例如 "banana" -> 'a', 'n', 'a', 'n', 'a' - 这种需求较少，一般会去重）。

为了全面覆盖，本文将主要聚焦于第一种和第二种场景，并兼顾其他变体。

1. 基础概念：C语言中的字符串表示

在C语言中，字符串是以字符数组的形式存储的，并以空字符 `\0` 作为结束标志。例如，`char str[] = "hello";` 定义了一个包含 'h', 'e', 'l', 'l', 'o', '\0' 六个字符的数组。理解这一点是进行字符串操作的基础。#include <stdio.h>
#include <string.h> // 包含strlen等字符串处理函数
#include <stdlib.h> // 包含tolower等字符处理函数
// 函数声明，用于后续代码组织
void find_and_print_duplicates_naive(const char* str);
void find_and_print_duplicates_optimized(const char* str);
void find_and_print_duplicates_order_preserving(const char* str);

2. 朴素解法：嵌套循环遍历（O(N^2)）

最直观的思路是使用两重循环。外层循环遍历字符串中的每一个字符，内层循环则从当前字符的下一个位置开始，检查是否有与外层字符相同的字符。如果找到，说明该字符是重复的。为了避免重复输出同一个重复字符（例如 "apple"，'p' 重复了两次，我们只想输出一次 'p'），我们需要一个额外的机制来记录已经处理过的重复字符。/
* @brief 朴素解法：使用嵌套循环查找并输出重复字母。
* 时间复杂度 O(N^2)，空间复杂度 O(1) (忽略输出存储)。
* 输出顺序不保证与原始字符串中重复字符首次出现的顺序一致。
* @param str 要处理的字符串。
*/
void find_and_print_duplicates_naive(const char* str) {
if (str == NULL || strlen(str) == 0) {
printf("字符串为空或无效。");
return;
}
int len = strlen(str);
char printed_duplicates[256] = {0}; // 用作一个简单的哈希表，标记已打印的重复字符
// 假定ASCII字符集
int printed_count = 0;
printf("朴素解法找到的重复字母：");
for (int i = 0; i < len; i++) {
// 检查当前字符是否已经标记为已打印的重复字符
// 注意：这种检查的效率取决于printed_duplicates的结构
// 如果printed_duplicates[str[i]] != 0，则说明该字符已处理过
if (printed_duplicates[(unsigned char)str[i]] != 0) {
continue; // 已经处理过并输出了这个重复字符
}
int is_duplicate = 0;
for (int j = i + 1; j < len; j++) {
if (str[i] == str[j]) {
is_duplicate = 1;
break; // 找到一个重复项就足够了
}
}
if (is_duplicate) {
printf("'%c' ", str[i]);
printed_duplicates[(unsigned char)str[i]] = 1; // 标记为已打印
printed_count++;
}
}
if (printed_count == 0) {
printf("无");
}
printf("");
}

复杂度分析：
时间复杂度： 外层循环执行N次，内层循环平均执行N/2次，总共大约 `N * N/2` 次比较，因此是 O(N^2)。对于长字符串，这种方法效率非常低下。
空间复杂度： `printed_duplicates` 数组的大小是固定的256（针对ASCII字符集），与输入字符串长度无关，因此是 O(1)。

3. 优化解法：利用字符计数数组（O(N)）

为了提高效率，我们可以利用一个辅助数组（可以看作一个简易的哈希表）来记录每个字符出现的次数。由于标准ASCII字符集只有256个字符，我们可以声明一个大小为256的整型数组，数组的索引代表字符的ASCII值，数组的值代表该字符出现的次数。这个方法只需要遍历字符串两次：第一次遍历统计字符频率，第二次遍历根据频率判断并输出重复字符。#include <ctype.h> // for tolower/toupper if needed
/
* @brief 优化解法：使用字符计数数组查找并输出重复字母。
* 时间复杂度 O(N)，空间复杂度 O(1) (固定大小的计数数组)。
* 输出顺序不保证与原始字符串中重复字符首次出现的顺序一致，而是按ASCII值顺序。
* @param str 要处理的字符串。
*/
void find_and_print_duplicates_optimized(const char* str) {
if (str == NULL || strlen(str) == 0) {
printf("字符串为空或无效。");
return;
}
int char_counts[256] = {0}; // 初始化所有字符计数为0
int len = strlen(str);
// 第一次遍历：统计每个字符的出现次数
for (int i = 0; i < len; i++) {
// 可以选择在这里进行大小写转换，例如 tolower(str[i])
// char_counts[(unsigned char)tolower(str[i])]++;
char_counts[(unsigned char)str[i]]++;
}
printf("优化解法找到的重复字母：");
int printed_count = 0;
// 第二次遍历：根据计数数组判断并输出重复字符
for (int i = 0; i < 256; i++) {
if (char_counts[i] > 1) { // 如果某个字符的计数大于1，则它是重复的
printf("'%c' ", (char)i);
printed_count++;
}
}
if (printed_count == 0) {
printf("无");
}
printf("");
}

复杂度分析：
时间复杂度： 第一次遍历字符串 O(N)，第二次遍历固定大小的计数数组 O(256) (常数时间)。总共是 O(N + 256)，简化为 O(N)。这是非常高效的。
空间复杂度： `char_counts` 数组的大小固定为256，因此是 O(1)。

处理大小写： 如果我们希望忽略大小写（例如 'A' 和 'a' 算作同一个重复字符），可以在统计时将所有字符统一转换为小写或大写：`char_counts[(unsigned char)tolower(str[i])]++;`。

4. 进阶需求：保持输出顺序的重复字母查找（O(N)）

上述优化解法虽然高效，但输出的重复字符是按照它们的ASCII值顺序排列的，而不是它们在原始字符串中首次出现的顺序。如果需要保持输出顺序，我们可以在第二次遍历时再次遍历原字符串，并结合计数数组和已输出标记来实现。/
* @brief 保持输出顺序的解法：使用字符计数数组和已打印标记查找并输出重复字母。
* 时间复杂度 O(N)，空间复杂度 O(1) (固定大小的计数数组)。
* 输出顺序与原始字符串中重复字符首次出现的顺序一致。
* @param str 要处理的字符串。
*/
void find_and_print_duplicates_order_preserving(const char* str) {
if (str == NULL || strlen(str) == 0) {
printf("字符串为空或无效。");
return;
}
int char_counts[256] = {0}; // 统计字符出现次数
int already_printed[256] = {0}; // 标记是否已经打印过该重复字符
int len = strlen(str);
// 第一次遍历：统计每个字符的出现次数
for (int i = 0; i < len; i++) {
char_counts[(unsigned char)str[i]]++;
}
printf("按顺序输出的重复字母：");
int printed_count = 0;
// 第二次遍历：再次遍历原始字符串，并根据计数判断和已打印标记输出
for (int i = 0; i < len; i++) {
unsigned char current_char = str[i];
if (char_counts[current_char] > 1 && already_printed[current_char] == 0) {
printf("'%c' ", current_char);
already_printed[current_char] = 1; // 标记为已打印
printed_count++;
}
}
if (printed_count == 0) {
printf("无");
}
printf("");
}

复杂度分析：
时间复杂度： 第一次遍历字符串 O(N)，第二次遍历字符串 O(N)。总共是 O(N)。
空间复杂度： `char_counts` 和 `already_printed` 数组的大小都是固定的256，因此是 O(1)。

5. 变体需求与考虑

除了上述基本场景，还有一些相关的变体需求：

5.1 找出第一个重复的字符

这种情况下，我们可以在第一次遍历统计字符频率后，再次遍历原字符串，找到第一个 `char_counts[current_char] > 1` 的字符并立即返回。/
* @brief 查找第一个重复出现的字符。
* @param str 要处理的字符串。
* @return 第一个重复出现的字符，如果没有则返回 '\0'。
*/
char find_first_duplicate(const char* str) {
if (str == NULL || strlen(str) == 0) {
return '\0';
}
int char_counts[256] = {0};
int len = strlen(str);
for (int i = 0; i < len; i++) {
char_counts[(unsigned char)str[i]]++;
}
for (int i = 0; i < len; i++) {
if (char_counts[(unsigned char)str[i]] > 1) {
return str[i];
}
}
return '\0'; // 没有重复字符
}

5.2 找出第一个不重复的字符

与上一个问题类似，但条件是 `char_counts[current_char] == 1`。/
* @brief 查找第一个不重复出现的字符。
* @param str 要处理的字符串。
* @return 第一个不重复出现的字符，如果没有则返回 '\0'。
*/
char find_first_non_duplicate(const char* str) {
if (str == NULL || strlen(str) == 0) {
return '\0';
}
int char_counts[256] = {0};
int len = strlen(str);
for (int i = 0; i < len; i++) {
char_counts[(unsigned char)str[i]]++;
}
for (int i = 0; i < len; i++) {
if (char_counts[(unsigned char)str[i]] == 1) {
return str[i];
}
}
return '\0'; // 没有不重复字符
}

6. 性能考量与最佳实践
字符集： 上述所有 O(N) 的优化方法都依赖于字符集的固定大小（256个ASCII字符）。如果需要处理包含Unicode（如UTF-8）字符的字符串，这种简单的字符计数数组就不再适用。因为Unicode字符编码长度不一，且字符数量庞大。在这种情况下，你需要使用更复杂的哈希映射数据结构（如C++的 `std::map` 或自定义哈希表）来存储字符计数，或者使用专门的Unicode库。但对于C语言面试和一般应用，通常假设是ASCII或扩展ASCII。
空字符串或NULL： 在处理字符串时，始终要检查字符串是否为空指针 (`NULL`) 或空字符串 (`strlen(str) == 0`)，以避免段错误和不必要的计算。
常量字符串： 如果函数不需要修改传入的字符串，最好将参数声明为 `const char*`，这有助于编译器进行类型检查，并明确函数不会有副作用。

7. 完整示例与主函数

为了演示上述函数的用法，我们可以编写一个主函数来调用它们：int main() {
const char* test_str1 = "programming";
const char* test_str2 = "hello world";
const char* test_str3 = "abcdefg";
const char* test_str4 = "aaaaa";
const char* test_str5 = "abracadabra";
const char* test_str6 = "";
const char* test_str7 = NULL;
printf("--- 测试字符串: %s ---", test_str1);
find_and_print_duplicates_naive(test_str1);
find_and_print_duplicates_optimized(test_str1);
find_and_print_duplicates_order_preserving(test_str1);
printf("第一个重复字符: '%c'", find_first_duplicate(test_str1));
printf("第一个不重复字符: '%c'", find_first_non_duplicate(test_str1));
printf("--- 测试字符串: %s ---", test_str2);
find_and_print_duplicates_naive(test_str2);
find_and_print_duplicates_optimized(test_str2);
find_and_print_duplicates_order_preserving(test_str2);
printf("第一个重复字符: '%c'", find_first_duplicate(test_str2));
printf("第一个不重复字符: '%c'", find_first_non_duplicate(test_str2));

printf("--- 测试字符串: %s ---", test_str3);
find_and_print_duplicates_naive(test_str3);
find_and_print_duplicates_optimized(test_str3);
find_and_print_duplicates_order_preserving(test_str3);
printf("第一个重复字符: '%c'", find_first_duplicate(test_str3));
printf("第一个不重复字符: '%c'", find_first_non_duplicate(test_str3));
printf("--- 测试字符串: %s ---", test_str4);
find_and_print_duplicates_naive(test_str4);
find_and_print_duplicates_optimized(test_str4);
find_and_print_duplicates_order_preserving(test_str4);
printf("第一个重复字符: '%c'", find_first_duplicate(test_str4));
printf("第一个不重复字符: '%c'", find_first_non_duplicate(test_str4));
printf("--- 测试字符串: %s ---", test_str5);
find_and_print_duplicates_naive(test_str5);
find_and_print_duplicates_optimized(test_str5);
find_and_print_duplicates_order_preserving(test_str5);
printf("第一个重复字符: '%c'", find_first_duplicate(test_str5));
printf("第一个不重复字符: '%c'", find_first_non_duplicate(test_str5));
printf("--- 测试空字符串: %s ---", test_str6);
find_and_print_duplicates_naive(test_str6);
find_and_print_duplicates_optimized(test_str6);
find_and_print_duplicates_order_preserving(test_str6);
printf("第一个重复字符: '%c'", find_first_duplicate(test_str6));
printf("第一个不重复字符: '%c'", find_first_non_duplicate(test_str6));

printf("--- 测试 NULL 字符串 ---");
find_and_print_duplicates_naive(test_str7);
find_and_print_duplicates_optimized(test_str7);
find_and_print_duplicates_order_preserving(test_str7);
printf("第一个重复字符 (NULL): '%c'", find_first_duplicate(test_str7));
printf("第一个不重复字符 (NULL): '%c'", find_first_non_duplicate(test_str7));
return 0;
}

从简单的两重循环到使用字符计数数组进行优化，C语言提供了多种方法来实现字符串中重复字母的查找与输出。对于大多数ASCII字符集内的字符串处理任务，基于字符计数数组的 O(N) 算法是最佳选择，它在时间效率和空间效率上都表现出色。理解不同算法的复杂度以及它们适用于的场景，是专业程序员必备的技能。在实际应用中，还需要考虑字符集范围、大小写敏感性以及输入校验等因素，以确保程序的健壮性和普适性。

2025-10-14

上一篇：C语言实现完美棱形输出：从入门到精通

下一篇：C语言字符处理函数深度解析：从ctype.h到高级应用