白白手拉手-手拉手白癜风论坛

 找回密码
 注册新成员
搜索

查看: 1827|回复: 7

[生活闲谈] R data structures

[复制链接]

284

主题

1735

帖子

3万

积分

博士生

发表于 2017-4-13 00:01:31 | 显示全部楼层 |阅读模式

30秒注册登陆,可查看更多信息,结交更多好友,享用更多功能,轻松玩转论坛,白白手拉手欢迎您的加入!

您需要 登录 才可以下载或查看,没有账号?注册新成员

x
Vectors(向量)
The fundamental R data structure is the vector, which stores an ordered set of values called elements.A Vector can contain any number of elements, but all of the elements must be the same type of values.
Serveral vector  types are commonly used in machine learning: integer(numbers without decimals), double(numbers with decimals). character(text data),and logical(TRUE or FALSE values)。 There are also two special values: NULL,which is used to indicate the absence of any value, and NA,which indicated a missing value.
                                                                                                                                                [size=11.000000pt]It is tedious to enter large amounts of data manually, but small vectors can be createdby using the [size=10.000000pt]c() [size=11.000000pt]combine function. The vector can also be given a name using the[size=10.000000pt]<- [size=11.000000pt]arrow operator, which is R's way of assigning values, much like the [size=10.000000pt]= [size=11.000000pt]assignmentoperator is used in many other programming languages
[size=14.6667px]> subject_name<-c("John Doe", "Jane Doe", "Steve Graves")
[size=14.6667px]> temperature <- c(98.1, 98.6, 101.4)
[size=14.6667px]> flu_status <- c(FALSE, FALSE, TRUE)

                                                                                                [size=11.000000pt]Because R vectors are inherently ordered, the records can be accessed by countingthe item's number in the set, beginning at one, and surrounding this number withsquare brackets (that is, [size=10.000000pt][ [size=11.000000pt]and [size=10.000000pt]][size=11.000000pt]) after the name of the vector
[size=14.6667px]> temperature[2]
[size=14.6667px][1] 98.6

                                                                                                [size=11.000000pt]R offers a variety of convenient methods to extract data from vectors. A range ofvalues can be obtained using the ([size=10.000000pt]:[size=11.000000pt]) colon operator
[size=14.6667px]> temperature[2:3]
[size=14.6667px][1]  98.6 101.4
                                                Items can be excluded by specifying a negative item number
[size=14.6667px]> temperature[-2]
[size=14.6667px][1]  98.1 101.4
                                                Finally, it is also sometimes useful to specify a logical vector indicating whether eachitem should be included
[size=14.6667px]> temperature[c(TRUE,TRUE,FALSE)]
[size=14.6667px][1] 98.1 98.6
[size=11.000000pt]Factors(因子)

                                                                                                [size=11.000000pt]A [size=11.000000pt]factor [size=11.000000pt]is a special case of vector that is solely used to representcategorical or ordinal variables. In the medical dataset we are building, we might usea factor to represent gender, because it uses two categories: [size=10.000000pt]MALE [size=11.000000pt]and [size=10.000000pt]FEMALE
                                                To create a factor from a character vector, simply apply the factor() function.
[size=13.3333px]> gender <- factor(c("MALE","FEMALE","MALE"))
[size=13.3333px]> gender
[size=13.3333px][1] MALE   FEMALE MALE
[size=13.3333px]Levels: FEMALE MALE[size=10.000000pt]

                                                                                                [size=11.000000pt]Notice that when the gender data for John Doe and Jane Doe were displayed,
R printed additional information about the [size=10.000000pt]gender [size=11.000000pt]factor. The [size=10.000000pt]levels [size=11.000000pt]variablecomprise the set of possible categories [size=10.000000pt]factor [size=11.000000pt]could take, in this case: [size=10.000000pt]MALE [size=11.000000pt]or [size=10.000000pt]FEMALE[size=11.000000pt].
                                                When we create factors, we can add additional levels that may not appear inthe data
[size=14.6667px]> blood <- factor(c("O", "AB", "A"), levels = c("O", "AB", "A", "B"))
[size=14.6667px]> blood[1:2]
[size=14.6667px][1] O  AB
[size=14.6667px]Levels: O AB A B[size=11.000000pt]

                                                                                                [size=11.000000pt]Notice that when we de ned the [size=10.000000pt]blood [size=11.000000pt]factor for the three patients, we speci ed
[size=11.000000pt]an additional vector of four possible blood types using the [size=10.000000pt]levels [size=11.000000pt]parameter. As aresult, even though our data included only types [size=10.000000pt]O[size=11.000000pt], [size=10.000000pt]AB[size=11.000000pt], and [size=10.000000pt]A[size=11.000000pt], all the four types arestored with the [size=10.000000pt]blood [size=11.000000pt]factor as indicated by the output
                                                The factor data structure also allows us to include information about the order of anominal variable's categories, which provides a convenient way to store ordinal data.
[size=14.6667px]> symptoms <- factor(c("SEVERE", "MILD", "MODERATE"), levels = c("MILD", "MODERATE", "SEVERE"), ordered = TRUE)
[size=14.6667px]> symptoms
[size=14.6667px][1] SEVERE   MILD     MODERATE
[size=14.6667px]Levels: MILD < MODERATE < SEVERE

                                                                                                [size=11.000000pt]The resulting [size=10.000000pt]symptoms [size=11.000000pt]factor now includes information about the order werequested. Unlike our prior factors, the levels value of this factor are separatedby [size=10.000000pt]< [size=11.000000pt]symbols, to indicate the presence of a sequential order from mild to severe

                                                                                                [size=11.000000pt]A helpful feature of the ordered factors is that logical tests work as you expect. Forinstance, we can test whether each patient's symptoms are greater than moderate
[size=14.6667px]> symptoms > "MODERATE"
[size=14.6667px][1]  TRUE FALSE FALSE[size=11.000000pt]
List(列表)

                                                                                                [size=11.000000pt]A [size=11.000000pt]list [size=11.000000pt]is a data structure, much like a vector, in that it is used for storing an orderedset of elements. However, where a vector requires all its elements to be the sametype, a list allows different types [size=11.000000pt]of elements to be collected. Due to this  exibility,[size=11.000000pt]lists are often used to store various types of input and output data and sets of[size=11.000000pt]con guration parameters for machine learning models.
                                                                                                                                                [size=11.000000pt]Similar to creating a vector with [size=10.000000pt]c()[size=11.000000pt], a list is created using the [size=10.000000pt]list() [size=11.000000pt]function,

                               
                       
               
as shown in the following example. One notable difference is that when a list isc**tructed, each component in the sequence is almost always given a name. Thenames are not technically required, but allow the list's values to be accessed later onby name rather than by numbered position.
[size=14.6667px]> subject1 <- list(fullname = subject_name[1], temperature = temperature[1], flu_status = flu_status[1], gender = gender[1],blood = blood[1], symptoms = symptoms[1])
[size=14.6667px]> subject1
[size=14.6667px]$fullname
[size=14.6667px][1] "John Doe"

[size=14.6667px]$temperature
[size=14.6667px][1] 98.1

[size=14.6667px]$flu_status
[size=14.6667px][1] FALSE

[size=14.6667px]$gender
[size=14.6667px][1] MALE
[size=14.6667px]Levels: FEMALE MALE

[size=14.6667px]$blood
[size=14.6667px][1] O
[size=14.6667px]Levels: O AB A B

[size=14.6667px]$symptoms
[size=14.6667px][1] SEVERE

[size=14.6667px]Levels: MILD < MODERATE < SEVERE

                                                                                                [size=11.000000pt]Note that the values are labeled with the names we speci ed in the preceding[size=11.000000pt]command. However, a list can still be accessed using methods similar to a vector.
[size=14.6667px]> subject1[2]
[size=14.6667px]$temperature

[size=14.6667px][1] 98.1

                                                                                                [size=11.000000pt]The result of using vector-style operators on a list object is another list object, whichis a subset of the original list.
                                                To return a single list item in its native data type,use double brackets ([[ and ]]) when attempting to select the list component.
[size=14.6667px]> subject1[[2]]
[size=14.6667px][1] 98.1
                                                For clarity, it is often easier to access list components directly, by appending a $ andthe value's name to the name of the list component
[size=14.6667px]> subject1$temperature
[size=14.6667px][1] 98.1[size=11.000000pt]

                                                                                                [size=11.000000pt]It is possible to obtain several items in a list by specifying a vector of names.
[size=14.6667px]> subject1[c("temperature", "flu_status")]
[size=14.6667px]$temperature
[size=14.6667px][1] 98.1

[size=14.6667px]$flu_status
[size=14.6667px][1] FALSE[size=11.000000pt]  
[size=14.6667px]待续。。。。。
[size=14.6667px]引用:
[size=14.6667px]Packt《 Machine Learning with R 2nd Edition》









                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               

过去事已过去了,未来不必预思量。只今只道只今句,梅子熟时栀子香
如果您认可本论坛,欢迎告诉身边的病友,让更多的朋友来到这里,你+我=手拉手!

14

主题

2257

帖子

2万

积分

硕士生

发表于 2017-4-13 00:10:26 | 显示全部楼层
你这是复制的什么网页?

947

主题

17万

帖子

58万

积分

博士

发表于 2017-4-13 00:13:39 | 显示全部楼层
实在看不懂

                               
登录/注册后可看大图
牛蛙就是牛....

2471

主题

43万

帖子

174万

积分

超级版主

发表于 2017-4-13 09:25:58 | 显示全部楼层
做笔记的,做笔记的,大家能看懂的看,看不懂的给大家点赞!

284

主题

1735

帖子

3万

积分

博士生

 楼主| 发表于 2017-4-13 09:51:52 | 显示全部楼层
computerniu 发表于 2017-4-13 00:10
你这是复制的什么网页?

手敲的 笔记   显示的格式不太好 乱了
过去事已过去了,未来不必预思量。只今只道只今句,梅子熟时栀子香

14

主题

2257

帖子

2万

积分

硕士生

发表于 2017-4-13 09:53:54 | 显示全部楼层
cloudtone 发表于 2017-4-13 09:51
手敲的 笔记   显示的格式不太好 乱了


58

主题

4940

帖子

10万

积分

县长

发表于 2017-4-13 10:50:35 | 显示全部楼层
说的啥

65

主题

1万

帖子

7万

积分

区长

发表于 2017-4-13 22:56:44 | 显示全部楼层
大赞
您需要登录后才可以回帖 登录 | 注册新成员

本版积分规则

声明:本站是白癜风患者交流平台,旨在为广大白癜风患者创造良好的交流环境,欢迎更多白癜风患者朋友们加入,本站不对任何网友评论负责!

本站所有信息仅供参考,未经许可严禁拷贝转载,否则我们将追究相关法律责任!

拒绝任何人以任何形式在本论坛发表与中华人民共和国法律相抵触的言论!

白白手拉手法律顾问:河北高俊霞律师事务所 潘双喜律师

信息产业部备案号:苏ICP备20000142号-3

管理员QQ:1013342662 ;E-mail : vbbsls@126.com !

Archiver|手机版|小黑屋|白白手拉手

GMT+8, 2025-1-15 17:43

快速回复 返回顶部 返回列表