The goal of this project is to look for some patterns (or lack of patterns) in data. More precisely, you will investigate how often various digits appear in numerical data of different kinds.
Project¶
Below are links to csv files containing numerical data coming from various sources. Analyze how often each non-zero digit apprears as the first non-zero digit of the data. You don’t need to use all files, but you should analyze several of them. Use bar graphs (or other plots) to illustrate results of your computations. Make observations include analysis of any patterns that may appear. Perform the same analysis for the last non-zero digits.
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/nasdaq_2026_03_23.csvThis file contains information about trades of stocks listed on the New York Stock Exchange on March 23, 2036. Interesting data to analyze are stock prices (opening price, closing price, low and high price during the day) and volumes of stocks traded during the day. This data was obtained from the website stooq.com.
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/country_areas.csvThis file lists areas of countries in square kilometers and in square miles. The data was obtained from the website of The World Bank.
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/country_populations.csvThis file lists populations of countries for several years. You can choose to analyze data for one or more years. This data was obtained from the website of The World Bank.
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/airports.csvThis file contains information about airports and heliports around the world. Interesting numerical data here are elevations of airports above the sea level. This data was obtained from the website ourairports.com.
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/libraries.csvThis file contains information about holdings of public libraries in the United States in 2023. Interesting data to analyze includes population served by the library, and numbers of books, E-books etc. owned by the library. Another interesting option is to extract building numbers from library street addresses. This data was obtained from the website of the Institute of Museum and Library Services.
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/capital_distances.csvThis file lists distances (in kilometers and miles) between capitals of countries. This data was obtained from the website of Kristian Skrede Gleditsch.
Below is a link to a text file with about 1000 randomly selected Wikipedia articles. Find all numbers appearing in the text, and analyze frequency of first and last non-zero digits in these numbers. Compare this with the results you obtained in part 1:
https://cdn.jsdelivr.net/gh/bbadzioch/mth337_site@main/projects/first_digits/data/wikipedia_sample.txtNote. Wikipedia contains a lot of numbers of years that start with either 1 or 2. Check how your results change if you discard year numbers (for example, you can just ignore all integers that are between 1000 and 2050).