1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Kỹ thuật lập trình >

11 Managing Data—An Introduction to Lists

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.68 MB, 485 trang )


3.11 Managing Data — An Introduction to Lists



In[3]:=

Out[3]=

In[4]:=

Out[4]=



myList 357

32

myList357

32



We’ll use this last notation throughout this section as it is easy to read, but remember that you may

simply use double square brackets (which are easier to type). A negative number inside the double

square brackets indicates an item’s position relative to the end of the list. For instance, here is the

second to last item:

In[5]:=

Out[5]=



myList327

512



To extract a sequential portion of a longer list, one may indicate a Span of positions as follows:

In[6]:=

Out[6]=



myList31 ;; 47

2, 4, 8, 16



The most commonly specified items in a list are the first and last. There are, for convenience, special

commands to extract these items (although myList317 and myList3-17 work just as well):

In[7]:=

Out[7]=

In[8]:=

Out[8]=



First$myList(

2

Last$myList(

1024



Most of Mathematica’s arithmetic operations have the Listable attribute. This means they will be

“mapped over lists.” In other words, each item in the list will be operated upon individually by these

commands, and the list of results will be displayed. This is extremely handy. For example:

In[9]:=



1, 2, 3, 4  1



Out[9]=



2, 3, 4, 5



In[10]:=



2

myList



Out[10]=

In[11]:=

Out[11]=



4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048

Log$2, myList(

1, 2, 3, 4, 5, 6, 7, 8, 9, 10



¿ To find out if a command has the Listable attribute, type ?? followed by the command name,



127



128



Functions and Their Graphs



and evaluate the cell. All the attributes of the command will appear (along with a brief

description of the command and a list of its default option settings).



Recall that Mathematica stores a two-dimensional data table as a list of lists. That is, the data table is

stored as one long list, the members of which are the rows of the table. Each row of the table is in

turn stored as a list:



In[12]:=



data



1



214



11



378



21



680



31 1215

41 2178

51 3907



Out[12]=



1, 214, 11, 378, 21, 680, 31, 1215, 41, 2178, 51, 3907



In[13]:=



data337



Out[13]=



21, 680



To extract the item in row 3, column 2, do this:

In[14]:=

Out[14]=



data33, 27

680



To extract an entire column of a two-dimensional table, use All in the first position within the

double bracket:

In[15]:=

Out[15]=



data3All, 27

214, 378, 680, 1215, 2178, 3907



If your data happens to contain many columns, and you want to extract, say, only the second and

fourth columns, type data3All, 2, 47.

The importance of these extraction commands manifests itself in situations that call for a transformation of the data. In most cases this will amount to performing some arithmetic operation on every

item in a column of your data table. For instance, one column of a table may comprise the x coordinates of your data points, while another contains the corresponding y coordinates. You may want to

subtract 70 from all the x coordinates, or take the logarithm of all the y coordinates. How can this be

accomplished?

The simplest situation is one in which the same operation is to be applied to every member of a data

table. The listable attribute of most operations makes this a one-step process. For instance:



3.11 Managing Data — An Introduction to Lists



In[16]:=



Out[16]=



Log#data' ss Grid

0



Log#214'



Log#11'



Log#378'



Log#21'



Log#680'



Log#31' Log#1215'

Log#41' Log#2178'

Log#51' Log#3907'



If you wish to operate on just one of the columns, things are almost as simple. Suppose, for

instance, that you want to take the logarithm of only the second column. One might proceed as

follows (where we make a duplicate copy of the original data, then overwrite the second column in

this copy):

In[17]:=



newData data;

newData3All, 27



Log#data3All, 27';



newData ss Grid



Out[19]=



1



Log#214'



11



Log#378'



21



Log#680'



31 Log#1215'

41 Log#2178'

51 Log#3907'



Another method of accomplishing the same task invokes the useful Transpose command, which

switches rows and columns in a two-dimensional table.

In[20]:=



Out[20]=



Transpose$data3All, 17, Log#data3All, 27' ( ss Grid

1



Log#214'



11



Log#378'



21



Log#680'



31 Log#1215'

41 Log#2178'

51 Log#3907'



This latter approach suggests a useful means of extracting a few columns from a larger table of data

and applying transformations to them selectively. Here, for example, is a somewhat random collection of data:



129



130



Functions and Their Graphs



In[21]:=



data



Table$x, RandomInteger#10', RandomReal#10', RandomComplex#' , x, 6(;



Grid$data, Dividers ‘ Gray(

1



Out[22]=



4



1.96983



0.201769  0.55101 Ç



2 10



8.8533



0.388002  0.537243 Ç



3



7



1.79462 0.873406  0.754408 Ç



4



7



8.99804 0.338286  0.392776 Ç



5



0



5.90026



6



7



1.71122 0.610062  0.528001 Ç



0.903198  0.78486 Ç



And here is a new data table comprised only of the first column and the natural logarithm of the

third column:

In[23]:=



newData



Transpose$data3All, 17, Log#data3All, 37' (;



Grid$newData, Dividers ‘ Gray(

1 0.677946

2



2.18079



3 0.584791

Out[24]=



4



2.19701



5



1.775



6 0.537204



So, for instance, one may now apply ListPlot or Fit to the newData, as discussed in the previous

section.



Exercises 3.11

1. Suppose that data is input as a Table with 120 rows and 6 columns.

a. What command could you use to extract only columns 2 and 6?

b. What command could you issue to extract only the last 119 rows of columns 2 and 6 (for

instance, imagine that the first row contains headings for the columns and not actual data)?

c. What command could you issue to extract only the last 119 rows of columns 2 and 6, and

then replace column 6 with the natural logarithm of its values?



3.12 Importing Data



3.12 Importing Data

The simplest means of bringing external data into Mathematica is by utilizing the “paclet”

technology introduced in version 6. Many collections of data are curated regularly and stored on

servers at Wolfram Research. Mathematica has built-in access to these data (provided your computer

has internet access). That is, many built-in commands will simply call up these servers and deliver

hot, fresh data paclets to your current Mathematica session.

An example is in order. The command CountryData is used to access data about countries, continents, and so forth. Like the other data commands, CountryData may be called with empty argument to produce a list of basic data objects. You will notice a slight delay before the output appears,

but this will only happen the first time a data command is evaluated in a session; this is when the

data is transferred from the central server to your computer.

In[1]:=



Short$CountryData#', 3(



Out[1]//Short=

In[2]:=

Out[2]=



Afghanistan, Albania, i233j, Zambia, Zimbabwe



CountryData#' ss Length

237



Many of the data commands allow the single argument "Properties", which will list the properties

available for each of the countries (or for the primary data objects of the data command you are

using). At the time of this writing, there are 225 properties available for the country data:

In[3]:=



Short$CountryData$"Properties"(, 3(



Out[3]//Short=



AdultPopulation, AgriculturalProducts, i222j, WaterwayLength



The typical usage of CountryData takes the form CountryData$"tag", "property"(, where "tag" is a

string (i.e., it is enclosed in double quotation marks) representing a country or group of countries

(such as "UnitedStates" or "G8"), and "property" is a string representing the desired property for that

country. A similar syntax applies to the other data commands. For instance:

In[4]:=

Out[4]=



CountryData$"UnitedStates", "Population"(

2.98213 — 108



One may specify a date or a range of dates for the property as follows. In the latter case the output is

suitable for inclusion in the DateListPlot command:

In[5]:=

Out[5]=



CountryData$"UnitedStates", "Population", 1970 (

2.10111 — 108



131



132



Functions and Their Graphs



In[6]:=



DateListPlot$CountryData$"Kuwait", "Population", 1970, 2006 ((



2.5 — 106

2. — 106



Out[6]=

1.5 — 106

1. — 106

1970



1980



1990



2000



Here is the gross domestic product of Germany, in US dollars, at the official exchange rate in place

at the time of this writing:

In[7]:=

Out[7]=



CountryData$"Germany", "GDP"(

2.79486 — 1012



Here is Greenland’s oil consumption in barrels per day:

In[8]:=

Out[8]=



CountryData$"Greenland", "OilConsumption"(

3850.



And here we generate a list giving the name, gross domestic product, and oil consumption for every

country. To accomplish this we use Table, where c ranges over the list of all possible countries. To

save space, we use 31;;67 to take only the first six rows of data:

In[9]:=



Table$c, CountryData#c, "GDP"', CountryData$c, "OilConsumption"( ,

c, CountryData#'31 ;; 67 ( ss Grid

Afghanistan

Albania

Algeria



Out[9]=



AmericanSamoa

Andorra

Angola



6.50383 — 109



25 200.



11



246 000.



8.53753 — 10

1.02257 — 10

3.338 — 10



8



3.0909 — 10



9



2.88526 — 10



5000.



9



10



4000.

Missing#NotAvailable'

46 000.



Note the syntax used for missing data. With a bit of effort one can tweak the input above to produce

a nicely formatted table. To save space, we again use 31 ;; 67 to take only the first six rows of data:



3.12 Importing Data



In[10]:=



Text  Grid$

Prepend$

Table$c, CountryData#c, "GDP"', CountryData$c, "OilConsumption"( ,

c, CountryData#'31 ;; 67 (,

Table$Style$x, FontWeight ‘ "Bold"(,

x, "Country", "Gross Domestic Product +US dollars/",

"Oil Consumption +Barrels per day/" (

(,

Dividers ‘ Center, False, True, Spacings ‘ 2, Alignment ‘ Left, Center(

Gross Domestic Product +US dollars/



Country



Out[10]=



Afghanistan



6.50383 — 10



Albania



8.53753 — 109



Algeria



1.02257 — 10

3.338 — 10



AmericanSamoa



11



9



2.88526 — 10



Angola



5000.

25 200.

246 000.



8



3.0909 — 10



Andorra



Oil Consumption +Barrels per day/



9



4000.

Missing#NotAvailable'



10



46 000.



In the exercises we illustrate how to Sort the rows of such a table, for instance by oil consumption,

how to throw out rows containing missing data, and how to Select only rows, for instance, in which

gross domestic product exceeds a certain value. In short, the commands Sort and Select are needed

for such manipulations.

Here we make a ListPlot of the full data table above, showing each country’s annual gross domestic

product in U.S. dollars in the x coordinate, and that country’s oil consumption in barrels per day in

the y coordinate. A logarithmic scale is used on each axis. Missing data are simply not shown.

In[11]:=



ListLogLogPlot$Table$CountryData#c, "GDP"',

CountryData$c, "OilConsumption"( , c, CountryData#' ((

107

106

105



Out[11]=



104

1000

100

108



109



1010



1011



1012



1013



A slight modification allows us to add a Tooltip showing the name of each country as you mouseover its dot on the graphic. You’ll have to experience this in a live session to appreciate it. Essentially, a tooltip such as this adds another dimension of content to your information graphic.



133



134



Functions and Their Graphs



In[12]:=



ListLogLogPlot$

Table$Tooltip$CountryData#c, "GDP"', CountryData$c, "OilConsumption"( ,

CountryData#c, "Name"'(, c, CountryData#' ((



Out[12]=



Many of the data commands can produce graphical content. One can easily produce a map of each

country, for example:

In[13]:=



CountryData$"Greece", "Shape"(



Out[13]=



In[14]:=



GraphicsGrid$Partition$Table$CountryData$c, "Shape"(, c, CountryData#"G8"' (, 4(,

Dividers ‘ All, ImageSize ‘ 320(



Out[14]=



Many of the data commands load gigantic collections of data. AstronomicalData, for instance,

which has information on over 100,000 celestial bodies, is astronomical in size. ChemicalData has

information on over 18,000 chemicals. FinancialData has up-to-date information on over 186,000

securities. Each data command has its own unique syntax conventions, so the Documentation

Center page for each such command is a must read. But there are also many similarities between

commands; if you become familiar with one command, others will be easy to learn. For instance,



3.12 Importing Data



after reading this section the input and output below should be self-explanatory, with only the units

in need of explanation (in this case the units are seconds):

In[15]:=

Out[15]=



AstronomicalData#"Earth", "OrbitPeriod"'

3.1558149 — 107



Here we illustrate a pattern first deduced by Kepler—there is a mathematical relation between a

planet’s orbital period and its distance to the sun:

In[16]:=



data



Table$AstronomicalData$p, "OrbitPeriod"(,

AstronomicalData$p, "SemimajorAxis"( , p, AstronomicalData#"Planet"' (;



In[17]:=



ListLogLogPlot$data, AspectRatio ‘ .3, ImageSize ‘ 244(

5 — 1012

2 — 1012

1 — 1012



Out[17]= 5 — 1011

2 — 1011

1 — 1011

5 — 107 1 — 108



In[18]:=

Out[18]=



In[19]:=



5 — 108 1 — 109



5 — 109



FindFit$data, a

xb , a, b, x(

a ‘ 1.496467 — 106 , b ‘ 0.6667315

Show$Plot$1 496 476x2s3 , x, 0, 1010 (, ListPlot#data'(

7. — 1012

6. — 1012

5. — 1012

4. — 1012



Out[19]=



3. — 1012

2. — 1012

1. — 1012

2. — 109



4. — 109



6. — 109



8. — 109



2s3



Hence orbital “radius” is proportional to ,orbital period0



1. — 1010



, or as Kepler put it, radius3



period2 . The



point is simply that facility with one data command makes the other data commands a quick study,

and that facility with lists and data fitting makes the work of finding meaningful relations in data a

snap.

In addition to built-in data commands, it is common practice to import data from other sources,

such as a spreadsheet or text file, or directly from a web page. Suppose, for instance, you find a

collection of raw data on a web page. For example, if you were to visit the URL

http://www.census.gov/genealogy/names/dist.male.first you would find a collection of curated data



135



136



Functions and Their Graphs



from the 1990 United States census in which male first names are ranked by frequency. The web

page is simply a plain text file containing four columns of data, with one or more spaces separating

data values on each row, and with a return character at the end of each row. Use the Import command with a single argument, a string containing the URL for the web site, to bring the data into

Mathematica.

In[20]:=



data



Import$"http:sswww.census.govsgenealogysnamessdist.male.first"(;



There are over 1200 rows of data here. To save space we display only the top-ten list of male first

names:

In[21]:=



Text 

Grid$Join$"Most Popular Male First Names from the 1990 Census", SpanFromLeft ,

"Name", "Frequency +/", "Cumulative Frequency +/", "Rank" ,

data31 ;; 107(, Dividers ‘ Gray(



Out[21]=



Most Popular Male First Names from the 1990 Census

Name

Frequency +/ Cumulative Frequency +/ Rank

JAMES

3.318

3.318

1

JOHN

3.271

6.589

2

ROBERT

3.143

9.732

3

MICHAEL

2.629

12.361

4

WILLIAM

2.451

14.812

5

DAVID

2.363

17.176

6

RICHARD

1.703

18.878

7

CHARLES

1.523

20.401

8

JOSEPH

1.404

21.805

9

THOMAS

1.38

23.185

10



The ListPlot below shows the cumulative frequency distribution for the entire data set. Note that

when a single list of numerical values is given as the argument to ListPlot, the x-coordinate values

1, 2, 3, … are used. This allows us to easily see that there are slightly more than 1200 data points. It

also reveals that the 200 most popular male names account for over 70% of all males in the U.S.

In[22]:=



ListPlot#data3All, 37, Joined ‘ True'

90

80



Out[22]= 70

60



0



200



400



600



800



1000



1200



3.12 Importing Data



This same import technique works for many types of raw files that are found online, even graphic

files:

In[23]:=



pic



Import$

"http:ssfaculty.rmc.edusbtorrencsbtbikeclubsimagessPoorFarmConesBW1099.

JPG", ImageSize ‘ 180(



Out[23]=



The InputForm of this image reveals it to be a Raster of a matrix of pixel values. Once imported,

one could apply a transformation to the matrix of numerical values to alter the image.

In[24]:=



Short$InputForm$pic(, 3(



Out[24]//Short=



Graphics#Raster#255, 679!!, 519!!, 3!!', 3!!'



When the data you’re after is found in a formatted table on a web page, add "Data" as a second

argument to Import, like this: Import$"URLstring" , "Data"'. For instance, here we import a web

page showing U.S. News and World Report’s list of top liberal arts colleges.

Import$

"http:sscolleges.usnews.rankingsandreviews.comsusnewsseduscollegesrankingss

briefst1libartcoBbrief .php", "Data"(



The output is rather large, so we don’t show it here. One simply copies and pastes (and if necessary,

edits) the list of data values from what is imported, and uses it as desired. Here, for instance, are the

top few colleges from this page at the time of this writing; we copied the relevant data from the

Import output, and pasted it into the Grid command below (evidently Carleton and Middlebury are

tied, as are Pomona and Bowdoin):



137



Xem Thêm
Tải bản đầy đủ (.pdf) (485 trang)

×