标签存档: C#

Surprises in Scala’s “Uniform Return Type” Magic

There’s an increasingly heated debate about Scala’s complexity. A recent post talks about Scala’s “true complexity“: the terrifying intricacy you will encounter, if you try to extend Scala’s Collections Library with a method that achieves the “perfect” behavior.

Yet it also might be worth asking: what are the motivations of these “perfect” behaviors? Are they always desirable?

One important aspect of the “perfect” collection behavior is the magical uniform return type principle. Sloppily speaking, this means that transforming a collection should give you another collection of the same kind. For instance:

// Filtering a List gives you a filtered List
scala> List(1,2,3,4).filter(_%2==0)
res: List[Int] = List(2, 4)

// Mapping a Set gives you a mapped Set
scala> Set(1,2,3,4).map(1+)
res: scala.collection.immutable.Set[Int] = Set(2, 3, 4, 5)

// Uniform return type at both compile time and runtime
scala> val iterable : Iterable[Int] = Set(1,2,3,4)
iterable: Iterable[Int] = Set(1, 2, 3, 4)

scala> iterable.map(1+)
res: Iterable[Int] = Set(2, 3, 4, 5)

// Works for Strings, too
scala> "E is the most frequent letter in English".map(
     |    c => if(c=='e'|c=='E') '3' else c)
res: String = 3 is th3 most fr3qu3nt l3tt3r in 3nglish

Clean and consistent, can’t be more intuitive. Of course filtering a List should return a List, and mapping a Set should give you a Set. Both compile time and runtime. Works even for non-collections like Strings.

When I first saw this, I thought “yeah, cool”, and then forgot about it. I took it for granted as just another Scala magic that makes my life easier, without realizing its intricate implications. Not until recently have I realized that Scala’s collections were much smarter, or trickier, than I had expected.

The uniform return type principle is fairly straightforward for operations like filter, take, drop and slice, but situations get complicated in the cases of some other operations like map and flatMap. Consider the following:

scala> Set(1,2,3,4).map(_ % 2 == 0)
res: scala.collection.immutable.Set[Boolean] = Set(false, true)

It starts getting interesting. When mapping a Set, you should get a Set, which by definition doesn’t contain repeated values. So you get a Set[Boolean] which consists of only 2 values. You might get surprised here if you are familiar with LINQ in C#. In C#

new HashSet(new[] {1, 2, 3, 4}).Select(x => x%2 == 0)

gives you an IEnumerable that consists of “false, true, false, true” in order, because the HashSet is simply iterated through as a Normal IEnumerable. Some say C#/LINQ sucks, because the type information of original collection is “lost”.

In Scala, Map[K, V] is a subtype of Iterable[(K, V)], thus can be seen as a collection of key-value pairs. Now stop for a moment and think about it: what happens if you map a Map?

scala> Map('a'->1,'b'->2,'c'->3,'d'->4).map(t => t._2 -> t._1)
res: scala.collection.immutable.Map[Int,Char] = Map(1 -> a, 2 -> b, 3 -> c, 4 -> d)

scala> Map('a'->0,'b'->1,'c'->0,'d'->1).map(t => t._1)
res: scala.collection.immutable.Iterable[Char] = List(a, b, c, d)

Ideally, according to the uniform return type principle, mapping a map should give you another Map. But this is only possible when the target element type is key-value pair. Otherwise, the best we can do is to treat the Map as just an Iterable of key-value pairs and return an List of the target elements.

However, there is still the problem of inconsistency. Here’s an example:

scala> Map(1->'a', 2->'b', 3->'c').map(_._2)
res: scala.collection.immutable.Iterable[Char] = List(a, b, c)

scala> Map(1->('a','b'), 2->('a','c'), 3->('b','c')).map(_._2)
res: scala.collection.immutable.Map[Char,Char] = Map(a -> c, b -> c)

Same operation on two Sets, but the behaviors are totally different, simply because these two Sets have different types of values? This is weird. The problem is that 2-tuples are selected to represent the key-value pairs, but are not exclusively used for this purpose. So when user map the elements to 2-tuples, there’s this unavoidable ambiguity.

It becomes even more error-prone if the actual class type of the collection is not known at compile time. We know that Map has a property “keys” that gives you an Iterable of its keys. Consider the following example:

scala> val m = Map(1 -> 'a', 2 -> 'b', 3 -> 'c', 4 -> 'd')
m: scala.collection.immutable.Map[Int,Char] = Map(1 -> a, 2 -> b, 3 -> c, 4 -> d)

scala> m.keys.map(_%2==0)
res: Iterable[Boolean] = Set(false, true)

Ah…the runtime type of “keys” is Set[Int], so repeated values in the returned collection are also ignored. The behavior should surprise you if you haven’t think of this. In fact, when given a Iterable without knowing its actual type, in any case the behaviors of collection operations are simply unpredictable. Scala guys can argue that this is just how it works, but this doesn’t seems to be a good thing, given that Iterables have been widely used between interfaces to pass data collections around without specifying their internal structures.

To sum up, the behavior of the map operation depends on not only the type of the collection on which it invokes, both static and dynamic, but also on the type of the mapped elements.

How is this magic implemented? If you take a look at the source code, you will probably see the most complicated collection library you’ve ever seen in your life. I’ll try to give a brief and simplified explanation. Here is the actual signature of the map method:

def map[MappedElem, That](p: Elem => MappedElem)
    (implicit bf: CanBuildFrom[This, MappedElem, That]): That

As you can see there’s an extra implicit parameter “bf” of type “CanBuildFrom[This, MappedElem, That]“, which will give you a “Builder[MappedElem, That]” that can build a collection (of type “That”) from the mapped elements (of type “MappedElem”). In short, “CanBuildFrom[This, MappedElem, That]” is a factory for “Builder[MappedElem, That]“, which itself is a factory for “That”. When both type parameters “This” and “MappedElem” are given, the compiler can find the best eligible for “bf” (according to some perplexing rules) and consequently determines the static type of “That”. For example:

// bf : CanBuildFrom[Map[_,_], (Char, Int), Map[Char, Int]]
Map(1 -> ‘a’, 2 -> ‘b’).map(t => t._2 -> t._1)

// bf: CanBuildFrom[Iterable[_], Int, Iterable[Int]]
Map(1 -> ‘a’, 2 -> ‘b’).map(_._1)

The “CanBuildFrom” can forward the call to the “genericBuilder[MappedElem]” method of the collection inferred in compile time, so that the “right” runtime type can be selected via virtual dispatch.

Now you should have noticed the conceptual and implementation complexity brought by the seemingly simple “uniform return type principle”. But why do we need this at the first place?

Let’s digress to talk about LINQ for a moment. The argument “LINQ sucks, because the type information of original collection is lost” doesn’t really stand, because I think this is exactly what it was designed to ignore. For most of the times, we often don’t care about whether the Iterable is a Seq or a Set or a Map or whatever. What we do with a collection is often just querying and consuming its content. LINQ provides a uniform interface with consistent, lazy behavior to facilitate this. When you need a collection of certain type, you explicitly convert the query result to it. When you need special behaviors of a Set or Map (Dictionary), you use their own interfaces. But queries are queries, different collections are treated equally. Such simplicity is not a weakness, but a feature.

In my point of view, LINQ and Scala’s collection operation are cosmetically similar but semantically different. LINQ queries a collection; Scala’s collection operation transforms a collection. (This may give you the clue why LINQ doesn’t have a ForEach extension method.) Scala’s functional nature necessitates the collection transformation semantic: you need to be able to deal with the value of a collection and compute a new value from it. And since Scala is also statically and strongly typed, the “uniform return type principle” is required to make the system consistent. Thus the behaviors of collection operations should of course depend on the actual collection type and the type of the target element. The operations are non-lazy (strict) by default to avoid unintended delay of the side-affects within the operations (there’s a more detailed explanation about the decision on laziness at the bottom of this page).

What if we only want to query and consume the content of the collection, like we do in LINQ? I suggest using Iterator directly or convert the collection to Stream:

scala> Set(1,2,3,4).iterator.map(_%2).foreach(print)
1010
scala> Set(1,2,3,4).toStream.map(_%2).foreach(print)
1010

What comes with the complexity is great flexibility and power. Scala is expressive and seems quite easy, but the paradox is that you need to spend much time learning to avoid misusing it. Scala is not easy, even in the case of these basic collection operations; it’s important to recognize that.

读取RFC822格式日期时间的类[C#]

比较无法理解为什么 .NET 类库里面没有提供足够对RFC822格式日期时间的支持。网上有人实现,但是都不太让人满意。比较囧的是,关于单字母表示时区,好几个实现都只考虑了 Z、A、M、N、Y 五个 spec 里直接列出的字母……我在用的 Argotic 也是这样。所以就自己实现了一个 parser,然后修改 Argotic 的代码直接用它。

这里下载源代码及其单元测试。需要的话就拿去吧,lgpl。

我实现的时候相比严格的 spec 又放宽了一些,比如允许小时、分钟、秒只有一位数字(spec 要求必须两位),允许不指定时区,允许用四位数字表示年份(这个是 rss 的 spec 要求的),允许只指定日期不指定时间。如果你觉着不爽就自己改改吧,看代码就知道非常好改(就在“Set Format Strings”那个 region 里)。

欢迎报告bug。

敏捷开发学习笔记(思维导图)

终于看完了《Agile Principles, Patterns, and Practices in C#》的第一个Section,上学期间读这些书的进度还真不是一般的慢……第一次尝试用MindMap做笔记,感觉不错,不过效率还是不够高。老蒋说我笔记做得太详尽了,像抄书…反思中。

篇幅所限,书中还是有很多内容不够深入。一些内容需要另一本书的篇幅深入讨论(如重构),甚至需要看一本书才能从纸上谈兵转到实践(如TDD、Acceptance Tests)。。。望不到头的书单啊T_T

做笔记使用的工具是 XMind

初来北航一个多月的零零碎碎

http://upload.tomtung.com/img/reminder_look_at_reminder_wall.jpg

几乎持续忙碌中。得空扯两句。

除了多了不少限制,生活和来之前差不多。食堂挺好。大运村很贵,不过想想也就4年而已,也就无所谓了。没阳台,很不方便。房间里有小强出没让人很不爽,不过比去年在福州一中见到的要小多了,也没那么嚣张。还好能有自己的书桌,用无线路由上着网,看着书,其实还是很惬意的。恩……我其实还是很容易满足的。

室友都很好很强大,让我觉得笨鸟先飞格外必要。有点后悔假期没有看看数学,现在一周五节数学(三节高数两节离散)让人有点吃不消。尤其是高数,大半年没有怎么做过这种题,现在看着它们感觉脑子像卡住了一样……英语依旧荒废中,腾不出手来处理,所幸 RP 足够好进了 B 班。专业方面,终于还是失去了第二遍读完 C++ Primer 的耐心,这份读书笔记也不知何年何月才能完成了。转而开始看 Effective C++,看完那三本 Effective 就不再在 C++ 上过多纠缠了,什么模板元之类的都省省吧……这门语言的复杂程度已经足够让我望而却步了,抱着厚书啃啊啃也啃不完总不是办法,还是今后在实践中逐渐积累吧。《Pro C# with .NET 3.0 Special Edition》刚到第八章,《Head First Design Patterns》也到第八章。它们的阅读进度都由于要顾及数学被严重拖慢了,很让人郁闷。至于其它七七八八的科目,什么语文啊计算机导论啊历史啊政治啊航空航天概论啊,都在摸着石头瞎混中,但愿能混过去……

没参加任何类似学生会或者班委会的组织。社团仅仅参加了 MSTC(微软技术俱乐部),并且顺利混进 .NET 组。据冬冬说,我在面试时极为成功的装X竟然让某巨牛印象深刻并颇为赞许,这让我相当有成就感。如果顺利,不太久会参与项目开发。看来我放假时决定学 C# 还算明智。现在感觉 MSTC 还算不错。

本来准备参加的 GC(Google Camp)也顺利入围技术部,不过了解更多以后觉得很不靠谱,暂且退出免得浪费时间。

ACM/ICPC 终于还是没有去参加……恩,连新生的选拔赛都干脆没去。虽然这个决定是早在年初就做好的,心中还是免不了一丝不舍。希望我没有做出错误的选择。

之前参加了一次 Liunx of BUAA 北航小聚,用 Vista 的我只能拿本里的 andLinux 充数。群牛闪耀,气氛很融洽,更重要的是我和冬冬很幸运地遇到 Jesse,他是在北航举行的08年 Gnome 亚洲峰会主页)北航方面的负责人,然后我们就高高兴兴地成了志愿者。记得是蒋天正还是谁给我们说,你们刚来没几天就能参加国际会议了~~ 作为技术上太水的人,我也就只能搬搬桌子、指个路什么的,以此为开源社区做点贡献了。顺带一提,这样一个技术会议的组织者竟然是 Emily 姐和 Pockey 姐两位女性,真是奇妙。会议很成功,这里有n多大家拍的照片。以此为契机,北航的开源社团似乎也即将成立,真是件好事。

一段时间听了很多碟。周董的新专辑还不如上一张,周导演果然没时间玩音乐了;JJ的新专辑安排很杂,不过整体质量还算不错。重新开始听原来没什么感觉的门德尔松无词歌。又开始迷巴赫,重听了以前没有特别留心感受的曲子,从大无开始,到小无,然后是钢琴的帕提塔、意大利协奏曲等等,并且各同时找好几个版本对照。上课路上看着阳光透过教学区树木的树冠,随着欢欣的库兰特舞跳动,或者眯起眼睛依着一首萨拉班德打一会儿瞌睡,觉得生活真是美好。托卡塔从耳旁流过时似乎可以提神,严谨紧凑的赋格似乎也会潜移默化地帮助思考。温暖且踏实的大调诠释着幸福,清冷或宁静的小调表达着自省,二者交替间很容易就在我的心室中共鸣回响起来。从不滥情的巴赫总能让人在忙碌中保持安稳,我每过一段时间似乎都会到巴赫这里停靠。每次少则一周多则一两个月,总能得到珍贵的体验。

恩……似乎这段时间值得说的就是这么多。瞎忙之中一切看起来都在向着好的方向发展,一些小麻烦还不足以影响心情。希望一切顺利。

最后,祝贺银男 Ghost 终于银了。祝高三各位大牛保送顺利。欢迎报考 BUAA。

写在临行前

昨天终于顺利闹完了,20号开学,今晚的火车,明天到。

如果从年初学校定下来开始算起,到现在,那真是有大半年的假期了。虽然基本上一直都是忙忙碌碌的,但是效率并不咋高。假期开始的时候没写什么假期计划,现在就写点总结吧。

E文的重要性我是清楚的,自己清楚确实需要加强。原来有人(好吧,我坦白,包括我自己)觉得我英语不错,其实现在客观地看,不过是几句简单的说得比较溜,能吓唬下不了解情况的人而已,其实在基础各方面问题非常多。

我差不多是从今年4月份开始重新拾起E文的。兰州找不到合适的班,就自己跟着一个叫《彭蒙惠英语》的杂志走。这个是空中英语教室系列的高级版,对我来说比较吃力,但是中级版又太EZ,只好勉强用这本,结果每个月甚至都没有精力看完当月的内容。4~8月读过的课文篇数分别为2、6、7、8、1,可见利用率并不高。

当初其实规划得挺好:每天都安排课文和配套广播,可以练阅读和听力;遇到的生词记下来可以提升词汇量;另外还在网上找到了语音讨论组可以练习口语。这样就可以全面提高了。但是对我来说杂志内容难度实在偏大,每天都要花费不少时间。每天如果读课文至少需要半小时,预习半小时,广播节目半小时。口语练习一小时。生词仅一共背了 2000左右(更不幸的是现在已经忘掉不少了),难度等级从小学初中到专八GRE都有(可见我野路子学上来的基础确实有问题),每天按照软件安排复习和初记至少半小时。这样算下来单单是英文每天就要最少花费3小时,哪天稍微耽搁后面进度就跟不上了,煞是郁闷,到最后就干脆坚持不下来了。现在还是没有真正找到合适的英语学习方法,比较愁,哪位同学有什么建议欢迎在下面评论。

数学看得不多,除去已经忘掉的就可以忽略不计了……

专业课方面,主要是看完了一本C#和对象建模的入门书,以及C++ Primer。前者非常基础,需要深入起码还要看好几本。后者,正如我在这里所说的,花了超出预期很多的时间才看完第一遍,第二遍也没能在开学前看完。本来原计划是在开学前不但看完Primer,还要看完那三本Effective的,彻底搞定明年新标准发布前我需要掌握的C++内容。已经完成的部分和计划相比差太多了。

学会了游泳。

自学一点钢琴,教材用599和哈农,从高考开始学到现在,各只弹到第15首。买的琴是Yamaha DGX-620,本来想带去学校继续学的,结果发现我根本没有理解20公斤意味着什么……况且宿舍小也不一定能放得下。原来所谓Portable Grand的意思仅仅是相对Grand稍微Portable 一点而已了。查了下其它的电钢,似乎真正比较Portable的键盘都不带配重……那就先算了吧。

恩……大致就是以上这些了。Linda姐问我假期最大的成果是什么,我说不知道……各项都不满意。忙忙碌碌但不出效果。其实把C++和E文放在一起的安排是很失败的,一天到晚都在看语言上拉拉杂杂的东西,搞得人又烦又低效。。。

忙忙碌碌的,看起来那么长的一个假期也就这样过去了。昨天去一中,见了老赵、DF、野牛、格格他们,又在楼道里习惯性闲逛,在二球下习惯性驻足,在公交站台上习惯性无意义等待。即将满载着回忆离开,除了少了很多熟识的面孔,那里的一切都还是一样熟悉。

之前同学陆陆续续都走差不多了,并没有太多离别的伤感。我只是觉得大家保持着联系就好,有缘也肯定还会再见。我似乎总是习惯看好现在,看好将来,并不太多留恋身后。想想旅行时我不喜欢拍很多照片留念可能也是这个原因吧。是豁达,是冷漠,还是不知珍惜,我自己无从分辨。我一直自以为是前者,只是昨天从姥姥家回来才产生了怀疑。

姥姥快八十岁了,坚持要在我走之前看看我。她对我絮絮叨叨了很多已经说过很多遍的事情,然后让我常给她打电话。我们都笑着说姥姥拿起电话就豆腐三碗、三碗豆腐说个没完没了了,长途话费下来肯定把人愁死。姥姥摇着头说,她知道长途花费贵,又说我学习紧张,又说宿舍里那么多人也影响别人,只要我每周周末晚上打个电话就行了,她就和我说一句话,就一句,说好着呢,听听我的声音马上就挂掉……一边说用手掌抹着已经发红的眼眶。我们赶紧解释说长途花费是按时间计的,没有那么贵,学习也并没有那么紧张,也不会打扰到别人。姥姥还是摇着头说没事的时候多休息不要打电话,每周打一次电话,说一句话听听我的声音就行了,就一句话就行了……絮叨地这么说了很多遍。

我有些不知所措,突然间感到亲人对自己如此强烈的不舍。看着姥姥发红的双眼,我突然觉得不忍离去。这是一种我其实一直没有真正明白的情感,曾经很多老师对此的解释都是“等你们做父母、祖父母了就明白了”。作为一个自私的独生子女,我想我并没有真正懂得什么是爱。

我只是愣在那里,不断提醒自己记得多打几个电话。除此之外我不知道还能做到什么。

Unable To Stay, Unwilling To Leave. 不管怎样,几天以后就是完全不同的生活了。


标签云

豆瓣