For this project we are going to be using a set of data thatrepresenting flight data from 1990 – 2009. The complete set of datahas over 3.5 million records. We will only be using parts of it forour testing for the sake of time. However, if you can write asuccessful project for the smaller data set it should work for thecomplete data set. I’ll post a link to the complete data set at theend for anyone that is interested in it.
The data has 11 fields that are described here:
Origin – string – 3 letter airport code
Destination – string – 3 letter airport code
Origin City – string – origin city name
Destination City – string – destination city name
Passengers – integer – the number of passengers on theflight
Seats – integer – the number of seats available on theflight
Flights – integer – the number of flights between the origin anddestination for the given month/year combination
Distance – integer – distance (to the nearest mile) flownbetween the origin and destination
Fly Date – integer – The date (yyyymm) of flight
Origin Population – integer – Origin city’s population asreported by U.S. Census
Destination Population – integer – Destination city’s populationas reported by U.S. Census
CLE ERI Cleveland, OH Erie, PA 0 0 56 98 200703 2099185 279804DLH ERI Duluth, MN Erie, PA 44 173 1 678 200703 273889 279804DAY ERI Dayton, OH Erie, PA 33 50 1 259 200712 839563 279804EWR ERI Newark, NJ Erie, PA 0 122 1 326 200708 18901167 279804ALB ERI Albany, NY Erie, PA 19 37 1 329 200703 852019 279804CAK ERI Akron, OH Erie, PA 0 120 1 104 200708 700559 279804ERI ERI Erie, PA Erie, PA 104 112 1 0 200701 279804 279804RNO ERI Reno, NV Erie, PA 143 150 1 2065 200704 410616 279804ACY ERI Atlantic City, NJ Erie, PA 97 122 1 345.0 200712 269945 279804ACY ERI Atlantic City, NJ Erie, PA 108 120 1 345.0 200707 269945 279804
The data is a tabbed separated data file. So between each partof the data is a tab character (t). You can/should watch the HowTo Read Tab Delimited set of videos for more information on how youcan read in this type of data. Those videos are intended to giveyou an idea on how to get started.
In addition to the data file, we will have another input file inthe form of a command file. In fact, the command file will be theinput to the program.
The commands will have your program use a given data file tosearch for and summarize various pieces of information about thegiven input data file. The commands are detailed here:
c2c – this command will be followed by 2strings. The strings represent a potential origin and destinationairport codes. You are to search the given data file for any linethat matches the given pair of origin and destination codes. If youfind a match, then you are to show the information about the flightthat matches the origin and destination. You may find many matches.Additionally, when you have checked all the data in the file, youare to summarize the number of flights, average number ofpassengers per flight, average number of seats per flight, andaverage number of flights. If the origin and destination citiesaren’t found then an error message is output.
citySummary – this command is followed by anairport code. If the airport is found, then a summary ofinformation about the city is given. You will show the total numberof flights, number of inbound flights, number of outbound flights,average number of passengers per flight, average number of seats,and average number of flights. If the airport code is not found,then you issue an error message.
smallest – this command will search the datafile and find the smallest airport by population. This will useboth destination and origin population to find the smallest. Oncethe file is processed you will output the information about thesmallest city.
largest – this command will search the datafile and find the largest airport by population. This will use bothdestination and origin population to find the largest. Once thefile is processed you will output the information about the largestcity.
cityInfo – this command will be followed by aword, or words, and you are to match them with a city in the file.If the city is found, then the airport code and city name aredisplayed. The word, or words, may partially match either theorigin or destination city names. Use find on those variables tosearch within the strings. Once you have found a match, you maystop looking. We will only match to the first one we find. If nomatching city is found, an error message will be displayed.
datafile – this command will be followed by afilename. It is the file you are to use to do the searching. Allyou need to do is store it and pass it to the functions you writeto perform the searches. Those functions should open it tosearch.
Additionally, for each of these command you need to echo theword “Command:”, the command itself, and arguments that are given.See the same output for more information about this and for whatthe error messages should be.
So there are really two input files for our program. The firstis the command file, it will be passed to our function via theinput parameter. The second is the data file that is to be used forthe searching. It will be listed in the command file after thedatafile command. The datafile command may never occur or may occuronce or more times.
Sample Command File
datafile flight_edges.tsvc2c ROA CLTc2c NOT ME!citySummary ROAcitySummary WHEREsmallestlargestcityInfo RoanokecityInfo WHERE?
If you were given the command file above here is (most) of theoutput. I trimmed the output from the c2c ROA CLT command since ithas over 800 lines.
Command: datafile flight_edges.tsvCommand: c2c ROA CLT Origin Destination Origin City Destination City Passengers Seats Flights Distance Date Origin Pop Dest Pop ROA CLT Roanoke, VA Charlotte, NC 318 1057 7 155 12/1990 269195 1029829 ROA CLT Roanoke, VA Charlotte, NC 7197 15070 137 155 12/1990 269195 1029829 ROA CLT Roanoke, VA Charlotte, NC 7219 12320 112 155 3/1990 269195 1029829 ROA CLT Roanoke, VA Charlotte, NC 111 432 4 155 3/1990 269195 1029829<snip> ROA CLT Roanoke, VA Charlotte, NC 1626 2050 41 155 12/2009 300399 1745524 ROA CLT Roanoke, VA Charlotte, NC 1365 1961 53 155 12/2009 300399 1745524 ROA CLT Roanoke, VA Charlotte, NC 1980 2750 55 155 11/2009 300399 1745524 ROA CLT Roanoke, VA Charlotte, NC 1100 1680 24 155 11/2009 300399 1745524 Total # of flights: 858 Average # of passengers: 1851.04 Average # of seats: 3244.59 Average # of flights: 44.2145Command: c2c NOT ME! Sorry no flights between NOT and ME!Command: citySummary ROA Total # of flights: 9943 # inbound flights: 4977 # outbound flights: 4966 Average # of passengers: 987.941 Average # of seats: 1832.73 Average # of flights: 34.1203Command: citySummary WHERE Sorry no flights for WHERE found.Command: smallest Origin Destination Origin City Destination City Passengers Seats Flights Distance Date Origin Pop Dest Pop HNL AWX Honolulu, HI Andrews, TX 0 173 1 3466 7/2002 882628 12887Command: largest Origin Destination Origin City Destination City Passengers Seats Flights Distance Date Origin Pop Dest Pop LGA CAK New York, NY Akron, OH 134 137 1 396 12/2009 38139592 699935Command: cityInfo RoanokeAirport Code: ROACity: Roanoke, VACommand: cityInfo WHERE? Sorry no information for WHERE? found.
So you can see for this example, I used the completeflight_edges.tsv file. I will not give you the complete file oneweb-cat, as it is very large. I will be giving you smaller onesthat come from the complete file. If you can handle the completefile, then you can handle any of the smaller ones.
Below is the list of requirements. You can receive these pointseven if your code is not working and if you donâ€™t follow theserequirements you wonâ€™t score a 100.
You must declare void flightData( string input, string output );in a file named flight.h
You may not store the data from the files in an array, list,vector, etc. or any storage container from the STL.
You may not use the STL.
You must write a function for each of thecommands listed above. The only exception is the datafile command.The datafile command does not need a function.
I highly recommend you write a function to readone line from the data file and a function to output a singlerecord. It will help.
You may not use a struct, class, or other user-defined data typeto store the records.
You may not store the data in an array, simply process itsequentially in each function.
You may not use pointers.
You may not use global variables. <== This is a big one…askif you don’t know how to pass parameters correctly.
You may not write your code in the h file, write it in thecpp