|
30秒注册登陆,可查看更多信息,结交更多好友,享用更多功能,轻松玩转论坛,白白手拉手欢迎您的加入!
您需要 登录 才可以下载或查看,没有账号?注册新成员
x
Vectors(向量)
The fundamental R data structure is the vector, which stores an ordered set of values called elements.A Vector can contain any number of elements, but all of the elements must be the same type of values.
Serveral vector types are commonly used in machine learning: integer(numbers without decimals), double(numbers with decimals). character(text data),and logical(TRUE or FALSE values)。 There are also two special values: NULL,which is used to indicate the absence of any value, and NA,which indicated a missing value.
[size=11.000000pt]It is tedious to enter large amounts of data manually, but small vectors can be createdby using the [size=10.000000pt]c() [size=11.000000pt]combine function. The vector can also be given a name using the[size=10.000000pt]<- [size=11.000000pt]arrow operator, which is R's way of assigning values, much like the [size=10.000000pt]= [size=11.000000pt]assignmentoperator is used in many other programming languages
[size=14.6667px]> subject_name<-c("John Doe", "Jane Doe", "Steve Graves")
[size=14.6667px]> temperature <- c(98.1, 98.6, 101.4)
[size=14.6667px]> flu_status <- c(FALSE, FALSE, TRUE)
[size=11.000000pt]Because R vectors are inherently ordered, the records can be accessed by countingthe item's number in the set, beginning at one, and surrounding this number withsquare brackets (that is, [size=10.000000pt][ [size=11.000000pt]and [size=10.000000pt]][size=11.000000pt]) after the name of the vector
[size=14.6667px]> temperature[2]
[size=14.6667px][1] 98.6
[size=11.000000pt]R offers a variety of convenient methods to extract data from vectors. A range ofvalues can be obtained using the ([size=10.000000pt]:[size=11.000000pt]) colon operator
[size=14.6667px]> temperature[2:3]
[size=14.6667px][1] 98.6 101.4
Items can be excluded by specifying a negative item number
[size=14.6667px]> temperature[-2]
[size=14.6667px][1] 98.1 101.4
Finally, it is also sometimes useful to specify a logical vector indicating whether eachitem should be included
[size=14.6667px]> temperature[c(TRUE,TRUE,FALSE)]
[size=14.6667px][1] 98.1 98.6
[size=11.000000pt]Factors(因子)
[size=11.000000pt]A [size=11.000000pt]factor [size=11.000000pt]is a special case of vector that is solely used to representcategorical or ordinal variables. In the medical dataset we are building, we might usea factor to represent gender, because it uses two categories: [size=10.000000pt]MALE [size=11.000000pt]and [size=10.000000pt]FEMALE
To create a factor from a character vector, simply apply the factor() function.
[size=13.3333px]> gender <- factor(c("MALE","FEMALE","MALE"))
[size=13.3333px]> gender
[size=13.3333px][1] MALE FEMALE MALE
[size=13.3333px]Levels: FEMALE MALE[size=10.000000pt]
[size=11.000000pt]Notice that when the gender data for John Doe and Jane Doe were displayed,
R printed additional information about the [size=10.000000pt]gender [size=11.000000pt]factor. The [size=10.000000pt]levels [size=11.000000pt]variablecomprise the set of possible categories [size=10.000000pt]factor [size=11.000000pt]could take, in this case: [size=10.000000pt]MALE [size=11.000000pt]or [size=10.000000pt]FEMALE[size=11.000000pt].
When we create factors, we can add additional levels that may not appear inthe data
[size=14.6667px]> blood <- factor(c("O", "AB", "A"), levels = c("O", "AB", "A", "B"))
[size=14.6667px]> blood[1:2]
[size=14.6667px][1] O AB
[size=14.6667px]Levels: O AB A B[size=11.000000pt]
[size=11.000000pt]Notice that when we de ned the [size=10.000000pt]blood [size=11.000000pt]factor for the three patients, we speci ed
[size=11.000000pt]an additional vector of four possible blood types using the [size=10.000000pt]levels [size=11.000000pt]parameter. As aresult, even though our data included only types [size=10.000000pt]O[size=11.000000pt], [size=10.000000pt]AB[size=11.000000pt], and [size=10.000000pt]A[size=11.000000pt], all the four types arestored with the [size=10.000000pt]blood [size=11.000000pt]factor as indicated by the output
The factor data structure also allows us to include information about the order of anominal variable's categories, which provides a convenient way to store ordinal data.
[size=14.6667px]> symptoms <- factor(c("SEVERE", "MILD", "MODERATE"), levels = c("MILD", "MODERATE", "SEVERE"), ordered = TRUE)
[size=14.6667px]> symptoms
[size=14.6667px][1] SEVERE MILD MODERATE
[size=14.6667px]Levels: MILD < MODERATE < SEVERE
[size=11.000000pt]The resulting [size=10.000000pt]symptoms [size=11.000000pt]factor now includes information about the order werequested. Unlike our prior factors, the levels value of this factor are separatedby [size=10.000000pt]< [size=11.000000pt]symbols, to indicate the presence of a sequential order from mild to severe
[size=11.000000pt]A helpful feature of the ordered factors is that logical tests work as you expect. Forinstance, we can test whether each patient's symptoms are greater than moderate
[size=14.6667px]> symptoms > "MODERATE"
[size=14.6667px][1] TRUE FALSE FALSE[size=11.000000pt]
List(列表)
[size=11.000000pt]A [size=11.000000pt]list [size=11.000000pt]is a data structure, much like a vector, in that it is used for storing an orderedset of elements. However, where a vector requires all its elements to be the sametype, a list allows different types [size=11.000000pt]of elements to be collected. Due to this exibility,[size=11.000000pt]lists are often used to store various types of input and output data and sets of[size=11.000000pt]con guration parameters for machine learning models.
[size=11.000000pt]Similar to creating a vector with [size=10.000000pt]c()[size=11.000000pt], a list is created using the [size=10.000000pt]list() [size=11.000000pt]function,
as shown in the following example. One notable difference is that when a list isc**tructed, each component in the sequence is almost always given a name. Thenames are not technically required, but allow the list's values to be accessed later onby name rather than by numbered position.
[size=14.6667px]> subject1 <- list(fullname = subject_name[1], temperature = temperature[1], flu_status = flu_status[1], gender = gender[1],blood = blood[1], symptoms = symptoms[1])
[size=14.6667px]> subject1
[size=14.6667px]$fullname
[size=14.6667px][1] "John Doe"
[size=14.6667px]$temperature
[size=14.6667px][1] 98.1
[size=14.6667px]$flu_status
[size=14.6667px][1] FALSE
[size=14.6667px]$gender
[size=14.6667px][1] MALE
[size=14.6667px]Levels: FEMALE MALE
[size=14.6667px]$blood
[size=14.6667px][1] O
[size=14.6667px]Levels: O AB A B
[size=14.6667px]$symptoms
[size=14.6667px][1] SEVERE
[size=14.6667px]Levels: MILD < MODERATE < SEVERE
[size=11.000000pt]Note that the values are labeled with the names we speci ed in the preceding[size=11.000000pt]command. However, a list can still be accessed using methods similar to a vector.
[size=14.6667px]> subject1[2]
[size=14.6667px]$temperature
[size=14.6667px][1] 98.1
[size=11.000000pt]The result of using vector-style operators on a list object is another list object, whichis a subset of the original list.
To return a single list item in its native data type,use double brackets ([[ and ]]) when attempting to select the list component.
[size=14.6667px]> subject1[[2]]
[size=14.6667px][1] 98.1
For clarity, it is often easier to access list components directly, by appending a $ andthe value's name to the name of the list component
[size=14.6667px]> subject1$temperature
[size=14.6667px][1] 98.1[size=11.000000pt]
[size=11.000000pt]It is possible to obtain several items in a list by specifying a vector of names.
[size=14.6667px]> subject1[c("temperature", "flu_status")]
[size=14.6667px]$temperature
[size=14.6667px][1] 98.1
[size=14.6667px]$flu_status
[size=14.6667px][1] FALSE[size=11.000000pt]
[size=14.6667px]待续。。。。。
[size=14.6667px]引用:
[size=14.6667px]Packt《 Machine Learning with R 2nd Edition》
|
|