https://www.archives.gov/research/census/microfilm-catalog/1790-1890/part-01 |
Soundex is a phonetically coded surname index based on the way a surname sounds when spoken in English rather than the way it is spelled. Surnames that sound the same are coded the same even if they are spelled differently, for example, Smith and Smyth. The system was developed so that researchers could find names that were the same even though they had been spelled differently. Soundex assumes a standard English pronunciation of all of the names. The results of the Soundex coding rules are four characters, a letter, and three numbers. The letter is always the first letter of the surname and the remaining letters are assigned numbers which are assigned by the Soundex Guide. Here is the Soundex Coding Guide from the U.S. National Archives article entitled "Soundex System, The Soundex Indexing System."
Soundex Coding GuideSo my surname, Tanner, would be coded as T-560. The reason this is results is due to a list of Additional Soundex Coding Rules. Here are the additional rules from the U.S. National Archives website:
See also this online Soundex Calculator
Number Represents the Letters
1 B, F, P, V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M, N
6 R
Disregard the letters A, E, I, O, U, H, W, and Y.
1. Names With Double Letters
If the surname has any double letters, they should be treated as one letter. For example:
Gutierrez is coded G-362 (G, 3 for the T, 6 for the first R, second R ignored, 2 for the Z).
2. Names with Letters Side-by-Side that have the Same Soundex Code Number
If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter. Examples:
Pfister is coded as P-236 (P, F ignored, 2 for the S, 3 for the T, 6 for the R).
Jackson is coded as J-250 (J, 2 for the C, K ignored, S ignored, 5 for the N, 0 added).
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored, 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
3. Names with Prefixes
If a surname has a prefix, such as Van, Con, De, Di, La, or Le, code both with and without the prefix because the surname might be listed under either code. Note, however, that Mc and Mac are not considered prefixes.
For example, VanDeusen might be coded two ways:
V-532 (V, 5 for N, 3 for D, 2 for S)
or
D-250 (D, 2 for the S, 5 for the N, 0 added).
4. Consonant SeparatorsIf this looks complicated, there are a number of free online programs that will take any surname and translate it into the Soundex Code. See Eastman's Online Genealogy Newsletter, Soundex Calculator.
If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. Example:
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored (see "Side-by-Side" rule above), 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. Example:
Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.
Why do we still need to know about Soundex when we have complete alternative indexes to the U.S. Census Records? Well, the issue of misspelled names in the U.S. Federal Census still exists. One example from my own research was the misspelling of my surname Tanner as Tamer. However, if you look at the Soundex for both spellings, you will see that they are the same. What is my surname was misspelled as Turner? Since this is a completely different name and not merely a misspelling, Soundex doesn't help. By the way, the Soundex for Turner is T-656.
Some additional issues with the Soundex Coding are that there are more rules that pertain to special categories of the index. Here are some additional rules from the U.S. National Archives website.
The first clarification concerns the coding of American Indian and Asian Names. Here is the explanation of how this was handled by the code.
A phonetically spelled American Indian or Asian name was sometimes coded as if it were one name. If a distinguishable surname was given, the name may have been coded in the regular manner. For example, Dances with Wolves might have been coded as Dances (D-522) or as Wolves (W-412), or the name Shinka-Wa-Sa may have been coded as Shinka (S-520) or Sa (S-000).Another issue with non-English names is that they are more frequently misspelled than English names.
The Soundex indexing system created a card for each family in the Census. Here is a screenshot of some of the cards that were associated with the Soundex Code.
You can barely see the Soundex code at the bottom of the image. All of the Soundex Cards were microfilmed and the original microfilm copies are in the National Archives, the Regional Archive System of the National Archives and at the Family History Library in Salt Lake City, Utah. But now, the films have been digitized. The best source for all of the images is Archive.org. All this makes me thankful for digital images, computers, and string indexing. Here is a drawing of the Soundex Cards from the United States Census Bureau.
Now there are also some complications with the cards. Going back to the U.S. National Archives website, here is the first explanation about the different cards.
The Soundex microfilm rolls for the 1880 census include four different kinds of cards: Family Cards, Other Members of Family Continued Cards, Individual Cards, and Institution Cards. Below the coded surname at the top left of the card, the surname and then first name of the head of the family ordinarily appear as recorded on the schedule. The list at the end of this introduction, Abbreviations and Terms Used in Soundex Cards, is applicable to the 1880, 1900, 1910, and 1920 Soundexes. It can help researchers determine the relationships of persons to the head of the family. The most important information to record is: State or territory; volume, ED, sheet, and line numbers; county, city, and MCD.This is just the first level of complication. There are other Soundex Cards with more exceptions. Again to the U.S. National Archives website:
Frequently, if families include more than six members, the Family Card is followed by a related card. For very large families, more than one of these cards may appear. Handwritten numbers at the bottom of the cards refer to the first card (e.g., "#2, see #1"). Although the continuation card notes the name of the head of family and name, relationship, age, and birthplace of the other family member, this card excludes other personal information such as color and sex. It also omits most jurisdictional data found on the Family Card such as the county, city, MCD, and ED. Some researchers may need to search for a third kind of Soundex card, an Individual Card. This card contains data only on a child age 10 or under who (1) had a surname different from the head of family, or who (2) was not an immediate member of a family (e.g., stepson or nephew), or who (3) resided in an institution without a family. For the first two purposes, the Individual Card duplicates part of the information on a Family Card; it cross-references a census schedule. The Individual Card ordinarily is the only card referencing a particular child. Institution Cards appear at the end of the last roll of Soundex microfilm for a state or territory.
The Institution Cards, unlike the three other Soundex cards, are alphabetically arranged, not Soundex coded, by the first name of the institution. The first Institution card to appear in roll 168 names an institution whose name began with A Adams County, PA, Poorhouse. The Institution Cards exclude personal data on individuals and, at most, may note only the number of inhabitants.
Institution Cards include jurisdictional data necessary to find the correct census schedules (e.g., state, county, city, and ED). Street and house numbers also often appear on the cards. The cards exclude a printed heading for MCDs, but some indexers inserted this information on the line for city. Also, the cards have no caption for line numbers pertinent to the schedules, but some indexers inserted this information near the line for sheet number.With all these rules and exceptions, I guess it was unusual that I could find anything when I was using the Soundex Index in book format. Finding the Soundex Microfilm and then locating the correct images is another challenge. Here is an explanation from the U.S. National Archives website:
To use the census soundex to locate information about a person, you must know his or her full name and the state or territory in which he or she lived at the time of the census. It is also helpful to know the full name of the head of the household in which the person lived because census takers recorded information under that name.Another issue with the Soundex Index occurs when the information given to the enumerator was missing or incomplete. The U.S. National Archives explains this issue under the term "Not Reported Data."
Occasionally, some people gave the enumerator only a surname, without any given or middle name, or the indexer may have found this information missing or illegible. Under these circumstances, Not Reported (NR) or a blank can appear on a card after a surname. Cards with this NR feature appear first within a code. On census schedules, after the surname, some enumerators may have recorded only initials for a person or an initial before the middle name. Such cards are arranged alphabetically and may appear after those with the NR-first name. They ordinarily precede cards with full names bearing the same first letter. The indexers may also have encountered an NR surname, with or without a given name and initials. Cards with an NR surname for the head of family are on the last Soundex roll for a state or territory, usually before the Institution Cards. Roll 34 of California's Soundex (T737) states "Not Reported thru Institutions," but most roll listings in this catalog do not reference this feature.
The NR-surname cards may include enough personal information such as color, sex, age, street, and house number to identify a person. Some cards also list members of the family or household by surname and may include an indexer's remarks about possible relationships.There is ever more complication. Sometimes the codes on the cards can appear in nonconsecutive order. In this case, the researchers should use the alphabetized given names for searches.
If it is so complicated, why would we use it now? Well, the Soundex system of cataloging data is still being used. Some websites, such as Ancestry.com have an option to use Soundex. Here is a screenshot of the option when searching:
Ancestry.com also has a few collections of Soundex Reports.
There are still uses for Soundex.
Very useful to have details of how the Soundex codes were used. Thanks.
ReplyDelete