<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <!-- Generated by HsColour, http://code.haskell.org/~malcolm/hscolour/ --> <title>Text/CSV.hs</title> <link type='text/css' rel='stylesheet' href='hscolour.css' /> </head> <body> <pre><a name="line-1"></a><span class='hs-comment'>{- | <a name="line-2"></a> module: Text.CSV <a name="line-3"></a> license: MIT <a name="line-4"></a> maintainer: Jaap Weel <weel at ugcs dot caltech dot edu> <a name="line-5"></a> stability: provisional <a name="line-6"></a> portability: ghc <a name="line-7"></a> <a name="line-8"></a> This module parses and dumps documents that are formatted more or <a name="line-9"></a> less according to RFC 4180, \"Common Format and MIME Type for <a name="line-10"></a> Comma-Separated Values (CSV) Files\", <a name="line-11"></a> <<a href="http://www.rfc-editor.org/rfc/rfc4180.txt">http://www.rfc-editor.org/rfc/rfc4180.txt</a>>. <a name="line-12"></a> <a name="line-13"></a> There are some issues with this RFC. I will describe what these <a name="line-14"></a> issues are and how I deal with them. <a name="line-15"></a> <a name="line-16"></a> First, the RFC prescribes CRLF standard network line breaks, but <a name="line-17"></a> you are likely to run across CSV files with other line endings, so <a name="line-18"></a> we accept any sequence of CRs and LFs as a line break. <a name="line-19"></a> <a name="line-20"></a> Second, there is an optional header line, but the format for the <a name="line-21"></a> header line is exactly like a regular record and you can only <a name="line-22"></a> figure out whether it exists from the mime type, which may not be <a name="line-23"></a> available. I ignore the issues of header lines and simply turn them <a name="line-24"></a> into regular records. <a name="line-25"></a> <a name="line-26"></a> Third, there is an inconsistency, in that the formal grammar <a name="line-27"></a> specifies that fields can contain only certain US ASCII characters, <a name="line-28"></a> but the specification of the MIME type allows for other character <a name="line-29"></a> sets. I will allow all characters in fields, except for commas, CRs <a name="line-30"></a> and LFs in unquoted fields. This should make it possible to parse <a name="line-31"></a> CSV files in any encoding, but it allows for characters such as <a name="line-32"></a> tabs that the RFC may be interpreted to forbid even in non-US-ASCII <a name="line-33"></a> character sets. <a name="line-34"></a> <a name="line-35"></a> NOTE: Several people have asked me to implement extensions that are <a name="line-36"></a> used in non-US versions Microsoft Excel. This library implements <a name="line-37"></a> RFC-compliant CSV, not Microsoft Excel CSV. If you want to write a <a name="line-38"></a> library that deals with the CSV-like formats used by non-US versions <a name="line-39"></a> of Excel or any other software, you should write a separate library. I <a name="line-40"></a> suggest you call it Text.SSV, for "Something Separated Values." <a name="line-41"></a>-}</span> <a name="line-42"></a> <a name="line-43"></a><span class='hs-comment'>{- Copyright (c) Jaap Weel 2007. Permission is hereby granted, free <a name="line-44"></a>of charge, to any person obtaining a copy of this software and <a name="line-45"></a>associated documentation files (the "Software"), to deal in the <a name="line-46"></a>Software without restriction, including without limitation the rights <a name="line-47"></a>to use, copy, modify, merge, publish, distribute, sublicense, and/or <a name="line-48"></a>sell copies of the Software, and to permit persons to whom the <a name="line-49"></a>Software is furnished to do so, subject to the following conditions: <a name="line-50"></a>The above copyright notice and this permission notice shall be <a name="line-51"></a>included in all copies or substantial portions of the Software. THE <a name="line-52"></a>SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR <a name="line-53"></a>IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF <a name="line-54"></a>MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND <a name="line-55"></a>NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE <a name="line-56"></a>LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION <a name="line-57"></a>OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION <a name="line-58"></a>WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -}</span> <a name="line-59"></a> <a name="line-60"></a><span class='hs-keyword'>module</span> <span class='hs-conid'>Text</span><span class='hs-varop'>.</span><span class='hs-conid'>CSV</span> <span class='hs-layout'>(</span><span class='hs-conid'>CSV</span> <a name="line-61"></a> <span class='hs-layout'>,</span> <span class='hs-conid'>Record</span> <a name="line-62"></a> <span class='hs-layout'>,</span> <span class='hs-conid'>Field</span> <a name="line-63"></a> <span class='hs-layout'>,</span> <span class='hs-varid'>csv</span> <a name="line-64"></a> <span class='hs-layout'>,</span> <span class='hs-varid'>parseCSV</span> <a name="line-65"></a> <span class='hs-layout'>,</span> <span class='hs-varid'>parseCSVFromFile</span> <a name="line-66"></a> <span class='hs-layout'>,</span> <span class='hs-varid'>parseCSVTest</span> <a name="line-67"></a> <span class='hs-layout'>,</span> <span class='hs-varid'>printCSV</span> <a name="line-68"></a> <span class='hs-layout'>)</span> <span class='hs-keyword'>where</span> <a name="line-69"></a> <a name="line-70"></a><span class='hs-keyword'>import</span> <span class='hs-conid'>Text</span><span class='hs-varop'>.</span><span class='hs-conid'>ParserCombinators</span><span class='hs-varop'>.</span><span class='hs-conid'>Parsec</span> <a name="line-71"></a><span class='hs-keyword'>import</span> <span class='hs-conid'>Data</span><span class='hs-varop'>.</span><span class='hs-conid'>List</span> <span class='hs-layout'>(</span><span class='hs-varid'>intersperse</span><span class='hs-layout'>)</span> <a name="line-72"></a> <a name="line-73"></a><a name="CSV"></a><span class='hs-comment'>-- | A CSV file is a series of records. According to the RFC, the</span> <a name="line-74"></a><a name="CSV"></a><span class='hs-comment'>-- records all have to have the same length. As an extension, I</span> <a name="line-75"></a><a name="CSV"></a><span class='hs-comment'>-- allow variable length records.</span> <a name="line-76"></a><a name="CSV"></a><span class='hs-keyword'>type</span> <span class='hs-conid'>CSV</span> <span class='hs-keyglyph'>=</span> <span class='hs-keyglyph'>[</span><span class='hs-conid'>Record</span><span class='hs-keyglyph'>]</span> <a name="line-77"></a> <a name="line-78"></a><a name="Record"></a><span class='hs-comment'>-- | A record is a series of fields</span> <a name="line-79"></a><a name="Record"></a><span class='hs-keyword'>type</span> <span class='hs-conid'>Record</span> <span class='hs-keyglyph'>=</span> <span class='hs-keyglyph'>[</span><span class='hs-conid'>Field</span><span class='hs-keyglyph'>]</span> <a name="line-80"></a> <a name="line-81"></a><a name="Field"></a><span class='hs-comment'>-- | A field is a string</span> <a name="line-82"></a><a name="Field"></a><span class='hs-keyword'>type</span> <span class='hs-conid'>Field</span> <span class='hs-keyglyph'>=</span> <span class='hs-conid'>String</span> <a name="line-83"></a> <a name="line-84"></a><a name="csv"></a><span class='hs-comment'>-- | A Parsec parser for parsing CSV files</span> <a name="line-85"></a><span class='hs-definition'>csv</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>Parser</span> <span class='hs-conid'>CSV</span> <a name="line-86"></a><span class='hs-definition'>csv</span> <span class='hs-keyglyph'>=</span> <span class='hs-keyword'>do</span> <span class='hs-varid'>x</span> <span class='hs-keyglyph'><-</span> <span class='hs-varid'>record</span> <span class='hs-varop'>`sepEndBy`</span> <span class='hs-varid'>many1</span> <span class='hs-layout'>(</span><span class='hs-varid'>oneOf</span> <span class='hs-str'>"\n\r"</span><span class='hs-layout'>)</span> <a name="line-87"></a> <span class='hs-varid'>eof</span> <a name="line-88"></a> <span class='hs-varid'>return</span> <span class='hs-varid'>x</span> <a name="line-89"></a> <a name="line-90"></a><a name="record"></a><span class='hs-definition'>record</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>Parser</span> <span class='hs-conid'>Record</span> <a name="line-91"></a><span class='hs-definition'>record</span> <span class='hs-keyglyph'>=</span> <span class='hs-layout'>(</span><span class='hs-varid'>quotedField</span> <span class='hs-varop'><|></span> <span class='hs-varid'>field</span><span class='hs-layout'>)</span> <span class='hs-varop'>`sepBy`</span> <span class='hs-varid'>char</span> <span class='hs-chr'>','</span> <a name="line-92"></a> <a name="line-93"></a><a name="field"></a><span class='hs-definition'>field</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>Parser</span> <span class='hs-conid'>Field</span> <a name="line-94"></a><span class='hs-definition'>field</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>many</span> <span class='hs-layout'>(</span><span class='hs-varid'>noneOf</span> <span class='hs-str'>",\n\r\""</span><span class='hs-layout'>)</span> <a name="line-95"></a> <a name="line-96"></a><a name="quotedField"></a><span class='hs-definition'>quotedField</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>Parser</span> <span class='hs-conid'>Field</span> <a name="line-97"></a><span class='hs-definition'>quotedField</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>between</span> <span class='hs-layout'>(</span><span class='hs-varid'>char</span> <span class='hs-chr'>'"'</span><span class='hs-layout'>)</span> <span class='hs-layout'>(</span><span class='hs-varid'>char</span> <span class='hs-chr'>'"'</span><span class='hs-layout'>)</span> <span class='hs-varop'>$</span> <a name="line-98"></a> <span class='hs-varid'>many</span> <span class='hs-layout'>(</span><span class='hs-varid'>noneOf</span> <span class='hs-str'>"\""</span> <span class='hs-varop'><|></span> <span class='hs-varid'>try</span> <span class='hs-layout'>(</span><span class='hs-varid'>string</span> <span class='hs-str'>"\"\""</span> <span class='hs-varop'>>></span> <span class='hs-varid'>return</span> <span class='hs-chr'>'"'</span><span class='hs-layout'>)</span><span class='hs-layout'>)</span> <a name="line-99"></a> <a name="line-100"></a><a name="parseCSV"></a><span class='hs-comment'>-- | Given a file name (used only for error messages) and a string to</span> <a name="line-101"></a><span class='hs-comment'>-- parse, run the parser.</span> <a name="line-102"></a><span class='hs-definition'>parseCSV</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>FilePath</span> <span class='hs-keyglyph'>-></span> <span class='hs-conid'>String</span> <span class='hs-keyglyph'>-></span> <span class='hs-conid'>Either</span> <span class='hs-conid'>ParseError</span> <span class='hs-conid'>CSV</span> <a name="line-103"></a><span class='hs-definition'>parseCSV</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>parse</span> <span class='hs-varid'>csv</span> <a name="line-104"></a> <a name="line-105"></a><a name="parseCSVFromFile"></a><span class='hs-comment'>-- | Given a file name, read from that file and run the parser</span> <a name="line-106"></a><span class='hs-definition'>parseCSVFromFile</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>FilePath</span> <span class='hs-keyglyph'>-></span> <span class='hs-conid'>IO</span> <span class='hs-layout'>(</span><span class='hs-conid'>Either</span> <span class='hs-conid'>ParseError</span> <span class='hs-conid'>CSV</span><span class='hs-layout'>)</span> <a name="line-107"></a><span class='hs-definition'>parseCSVFromFile</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>parseFromFile</span> <span class='hs-varid'>csv</span> <a name="line-108"></a> <a name="line-109"></a><a name="parseCSVTest"></a><span class='hs-comment'>-- | Given a string, run the parser, and print the result on stdout.</span> <a name="line-110"></a><span class='hs-definition'>parseCSVTest</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>String</span> <span class='hs-keyglyph'>-></span> <span class='hs-conid'>IO</span> <span class='hs-conid'>()</span> <a name="line-111"></a><span class='hs-definition'>parseCSVTest</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>parseTest</span> <span class='hs-varid'>csv</span> <a name="line-112"></a> <a name="line-113"></a><a name="printCSV"></a><span class='hs-comment'>-- | Given an object of type CSV, generate a CSV formatted</span> <a name="line-114"></a><span class='hs-comment'>-- string. Always uses escaped fields.</span> <a name="line-115"></a><span class='hs-definition'>printCSV</span> <span class='hs-keyglyph'>::</span> <span class='hs-conid'>CSV</span> <span class='hs-keyglyph'>-></span> <span class='hs-conid'>String</span> <a name="line-116"></a><span class='hs-definition'>printCSV</span> <span class='hs-varid'>records</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>unlines</span> <span class='hs-layout'>(</span><span class='hs-varid'>printRecord</span> <span class='hs-varop'>`map`</span> <span class='hs-varid'>records</span><span class='hs-layout'>)</span> <a name="line-117"></a> <span class='hs-keyword'>where</span> <span class='hs-varid'>printRecord</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>concat</span> <span class='hs-varop'>.</span> <span class='hs-varid'>intersperse</span> <span class='hs-str'>","</span> <span class='hs-varop'>.</span> <span class='hs-varid'>map</span> <span class='hs-varid'>printField</span> <a name="line-118"></a> <span class='hs-varid'>printField</span> <span class='hs-varid'>f</span> <span class='hs-keyglyph'>=</span> <span class='hs-str'>"\""</span> <span class='hs-varop'>++</span> <span class='hs-varid'>concatMap</span> <span class='hs-varid'>escape</span> <span class='hs-varid'>f</span> <span class='hs-varop'>++</span> <span class='hs-str'>"\""</span> <a name="line-119"></a> <span class='hs-varid'>escape</span> <span class='hs-chr'>'"'</span> <span class='hs-keyglyph'>=</span> <span class='hs-str'>"\"\""</span> <a name="line-120"></a> <span class='hs-varid'>escape</span> <span class='hs-varid'>x</span> <span class='hs-keyglyph'>=</span> <span class='hs-keyglyph'>[</span><span class='hs-varid'>x</span><span class='hs-keyglyph'>]</span> <a name="line-121"></a> <span class='hs-varid'>unlines</span> <span class='hs-keyglyph'>=</span> <span class='hs-varid'>concat</span> <span class='hs-varop'>.</span> <span class='hs-varid'>intersperse</span> <span class='hs-str'>"\n"</span> <a name="line-122"></a> </pre></body> </html>